Playing Music with Images

For our last activity of the semester, we did something interesting -- as the title mentions, we played music with images. To do this, we used the technique known as template matching. As the name of the technique implies, template matching involves determining which part of an image follows or is similar to particular template[1], and there are many ways to do it. In this activity, this was done by convolving an image of interest with a template image.

Notice that since we are matching templates, the template in the template image should look exactly like the things similar to it in our image of interest. Further, since we are convolving both the image of interest and the template image, it should be noted that they are supposed to be of the same image size as if this condition is not met, then by the code used in doing this activity, we won't be able to achieve this.

The code used to do template matching is shown below. Notice that in the code, we took the Fourier transform of image of interest and template image, and multiplied them with each other, before taking their inverse Fourier transform. This is done because since we are convolving in real space, and doing so involves an integral, then our program is expected to become slower as our images increase in size, hence motivating the idea to do it in frequency space as a convolution in real space is just multiplication in frequency space.

def template_match(u1, u2):
    U1 = fft2(u1)
    U2 = fft2(u2)

    U3 = U1 * conj(U2)
    u3 = fftshift(ifft2(U3))
    
    return abs(u3)

To test this code, we apply template matching on an image. Shown in Figure 1a and 1b are the image of interest and the template image respectively.

Figure 1: (a) and (b) shows the image of interest and template image upon which
we apply the technique of template matching. (c) shows the immediate result of
doing so, while (d) shows what happens when we threshold the image shown in (c).

Applying the code shown above, we get Figure 1c. Notice that in Figure 1c, we see that the image shows something similar to Figure 1a and 1b, which is expected since convolving two functions give you something that resembles a little bit of both. However notice that there are parts of the image which are brighter than the rest. If we threshold the image to isolate them we get Figure 1d. Comparing this to Figure 1a, we notice that these bright spots correspond to the location of 'A' in the image of interest. Our code works!

To put this to the next level, we added some noise to our original image, as seen in Figure 2a. Notice that if we do template matching with the same template image shown in Figure 1b and threshold, we get Figure 2b. Again it works. Nice!

Figure 2: (a) shows Figure 1a with noise added, while (b) shows the result of applying
template matching with Figure 1b as the template image and thresholding.

Since things are working as expected, we now proceed to doing what we really want to do -- play music from an image. In order to do this though, we made a mapping. We assigned musical pitch to a number and musical duration to a color. We let 0 be nothing while 1 through 8 a C-major scale starting from C4 to C5. Then we assigned magenta, green, red, blue, and orange to be whole, half, quarter, eighth, and sixteenth notes. Arranging them in such a manner to 'create' music, we get Figure 3.

Figure 3: Example of music turned into an image. In this case, we play scales.

Say we want to 'play' this image. To do this in a program, we have to find a way to know which note it is, and what color it is in. Fortunately we can do this much better in HSV space, thus we did this part totally in HSV space, where saturation was used to identify the note, and hue to identify the duration it is played.

To know the notes played in the image, we have to know where they are located in it. To do this, we can dilate the characters horizontally and vertically, then extract edges. Doing this, we get Figure 4.

Figure 4: Process in obtaining the location of each note in the image. We dilate it
(a) horizontally and (b) vertically, and extract the edges. Doing this, we get the
(c) horizontal and (d) vertical location of each note in the image.

Seems that it is working fine. After this, we again used the code shown earlier to determine the pitch each note in the image corresponds to. We do template matching 9 times for each note, then determine the maximum value of the resulting image, then compare the maximum value for each resulting image. That pitch corresponding to the maximum maximum value after template matching determines the pitch of the note.

To determine the duration at which each note is played, we compared the hue values of each note to a reference value. Again, the closer the hue value of each note is to a reference value, that note will be played that long.

Doing all of these, we get:



Nice! It's working as intended.

This was really challenging, and made me want to give up before doing so. Thankfully it worked when I isolated each note! Thanks!

References:
[1] n. a.. Template Matching. (Web). Accessed on 12/09/18 at https://docs.adaptive-vision.com/4.7/studio/machine_vision_guide/TemplateMatching.html

Comments

Popular posts from this blog

First Post in this Blog

Edge Detection and Image Segmentation

Analyzing Images in Fourier Space