Basic Algorithms in Computer Vision

In this topic, we will be addressing how images are formed. We will introduce a library that is very useful for performing computer vision tasks and we will learn about the workings of some of these tasks and algorithms and how to code them.

Image Terminology

To understand computer vision, we first need to know how images work and how a computer interprets them.

A computer understands an image as a set of numbers grouped together. To be more specific, the image is seen as a two-dimensional array, a matrix that contains values from 0 to 255 (0 being for black and 255 for white in grayscale images) representing the values of the pixels of an image (pixel values), as shown in the following example:

Figure 2.1: Image representation without and with pixel values

In the image on the left-hand side, the number 3 is shown in a low resolution. On the right-hand side, the same image is shown along with the value of every pixel. As this value rises, a brighter color is shown, and if the value decreases, the color gets darker.

This particular image is in grayscale, which means it is only a two-dimensional array of values from 0 to 255, but what about colored images? Colored images (or red/green/blue (RGB) images) have three layers of two-dimensional arrays stacked together. Every layer represents one color each and putting them all together forms a colored image.

The preceding image has 14x14 pixels in its matrix. In grayscale, it is represented as 14x14x1, as it only has one matrix, and one channel. For the RGB format, the representation is 14x14x3 as it has 3 channels. From this, all that computers need to understand is that the images come from these pixels.

OpenCV

OpenCV is an open source computer vision library that has C++, Python, and Java interfaces and supports Windows, Linux, macOS, iOS, and Android.

For all the algorithms mentioned in this chapter, we will be using OpenCV. OpenCV helps us perform these algorithms using Python. If you want to practice one of these algorithms, we recommend using Google Colab. You will need to install Python 3.5 or above, OpenCV, and NumPy to carry on with this chapter. To display them on our screens, we will use Matplotlib. Both of these are great libraries for AI.

Basic Image Processing Algorithms

In order for a computer to understand an image, the image has to be processed first. There are many algorithms that can be used to process images and the output depends on the task at hand.

Some of the most basic algorithms are:

  • Thresholding
  • Morphological transformations
  • Blurring

Thresholding

Thresholding is commonly used to simplify how an image is visualized by both the computer and the user in order to make analysis easier. It is based on a value that the user sets and every pixel is converted to white or black depending on whether the value of every pixel is higher or lower than the set value. If the image is in grayscale, the output image will be white and black, but if you choose to keep the RGB format for your image, the threshold will be applied for every channel, which means it will still output a colored image.

There are different methods for thresholding, and these are some of the most used ones:

  1. Simple Thresholding: If the pixel value is lower than the threshold set by the user, this pixel will be assigned a 0 value (black), or 255 (white). There are also different styles of thresholding within simple thresholding:

    Threshold binary

    Threshold binary inverted

    Truncate

    Threshold to zero

    Threshold to zero inverted

    The different types of thresholds are shown in figure 2.2

    Figure 2.2: Different types of thresholds

    Threshold binary inverted works like binary but the pixels that were black are white and vice versa. Global thresholding is another name given to binary thresholding under simple thresholding.

    Truncate shows the exact value of the threshold if the pixel is above the threshold and the pixel value.

    Threshold to zero outputs the pixel value (which is the actual value of the pixel) if the pixel value is above the threshold value, otherwise it will output a black image, whereas threshold to zero inverted does the exact opposite.

    Note

    The threshold value can be modified depending on the image or what the user wants to achieve.

  2. Adaptive Thresholding: Simple thresholding uses a global value as the threshold. If the image has different lighting conditions in some parts, the algorithm does not perform that well. In such cases, adaptive thresholding automatically guesses different threshold values for different regions within the image, giving us a better overall result with varying lighting conditions.

    There are two types of adaptive thresholding:

    Adaptive mean thresholding

    Adaptive Gaussian thresholding

    The difference between the adaptive thresholding and simple thresholding is shown in figure 2.3

    Figure 2.3: Difference between adaptive thresholding and simple thresholding

    In adaptive mean thresholding, the threshold value is the mean of the neighborhood area, while in adaptive Gaussian thresholding, the threshold value is the weighted sum of the neighborhood values where weights are a Gaussian window.

  3. Otsu's Binarization: In global thresholding, we used an arbitrary value to assign a threshold value. Consider a bimodal image (an image where the pixels are distributed over two dominant regions). How would you choose the correct value? Otsu's binarization automatically calculates a threshold value from the image histogram for a bimodal image. An image histogram is a type of histogram that acts as a graphical representation of the tonal distribution in a digital image:

Figure 2.4: Otsu's thresholding

Exercise 4: Applying Various Thresholds to an Image

NOTE

As we are training artificial neural networks on Google Colab, we should use the GPU that Google Colab provides us. In order to do that, we would have to go to runtime > Change runtime type > Hardware accelerator: GPU > Save.

All the exercises and activities will be primarily developed in Google Colab. It is recommended to keep a separate folder for different assignments, unless advised not to.

The Dataset folder is available on GitHub in the Lesson02 | Activity02 folder.

In this exercise, we will be loading an image of a subway, to which we will apply thresholding:

  1. Open up your Google Colab interface.
  2. Create a folder for the book, download the Dataset folder from GitHub, and upload it in the folder.
  3. Import the drive and mount it as follows:

    from google.colab import drive

    drive.mount('/content/drive')

    Note

    Every time you use a new collaborator, mount the drive to the desired folder.

    Once you have mounted your drive for the first time, you will have to enter the authorization code that you would get by clicking on the URL given by Google and pressing the Enter key on your keyboard:

    Figure 2.5: Image displaying the Google Colab authorization step

  4. Now that you have mounted the drive, you need to set the path of the directory:

    cd /content/drive/My Drive/C13550/Lesson02/Exercise04/

    Note

    The path mentioned in step 5 may change as per your folder setup on Google Drive. The path will always begin with cd /content/drive/My Drive/.

    The Dataset folder must be present in the path you are setting up.

  5. Now you need to import the corresponding dependencies: OpenCV cv2 and Matplotlib:

    import cv2

    from matplotlib import pyplot as plt

  6. Now type the code to load the subway.jpg image, which we are going to process in grayscale using OpenCV and show using Matplotlib:

    Note

    The subway.jpg image can be found on GitHub in the Lesson02 | Exercise04 folder.

    img = cv2.imread('subway.jpg',0)

    plt.imshow(img,cmap='gray')

    plt.xticks([]),plt.yticks([])

    plt.show()

    Figure 2.6: Result of plotting the loaded subway image

  7. Let's apply simple thresholding by using OpenCV methods.

    The method for doing so in OpenCV is called cv2.threshold and it takes three parameters: image (grayscale), threshold value (used to classify the pixel values), and maxVal, which represents the value to be given if the pixel value is more than (sometimes less than) the threshold value:

    _,thresh1 = cv2.threshold(img,107,255,cv2.THRESH_BINARY)

    _,thresh2 = cv2.threshold(img,107,255,cv2.THRESH_BINARY_INV)

    _,thresh3 = cv2.threshold(img,107,255,cv2.THRESH_TRUNC)

    _,thresh4 = cv2.threshold(img,107,255,cv2.THRESH_TOZERO)

    _,thresh5 = cv2.threshold(img,107,255,cv2.THRESH_TOZERO_INV)

    titles = ['Original Image','BINARY', 'BINARY_INV', 'TRUNC','TOZERO','TOZERO_INV']

    images = [img, thresh1, thresh2, thresh3, thresh4, thresh5]

    for i in range(6):

        plt.subplot(2,3,i+1),plt.imshow(images[i],'gray')

        plt.title(titles[i])

        plt.xticks([]),plt.yticks([])

    plt.show()

    Figure 2.7: Simple thresholding using OpenCV

  8. We are going to do the same with adaptive thresholding.

    The method for doing so is cv2.adaptiveThreshold and it has three special input parameters and only one output argument. Adaptive method, block size (the size of the neighborhood area), and C (a constant that is subtracted from the mean or weighted mean calculated) are the inputs, whereas you only obtain the thresholded image as the output. This is unlike global thresholding, where there are two outputs:

    th2=cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY,71,7)

    th3=cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,71,7)

    titles = ['Adaptive Mean Thresholding', 'Adaptive Gaussian Thresholding']

    images = [th2, th3]

    for i in range(2):

        plt.subplot(1,2,i+1),plt.imshow(images[i],'gray')

        plt.title(titles[i])

        plt.xticks([]),plt.yticks([])

    plt.show()

    Figure 2.8: Adaptive thresholding using OpenCV

  9. Finally, let's put Otsu's binarization into practice.
  10. The method is the same as for simple thresholding, cv2.threshold, but with an extra flag, cv2.THRESH_OTU:

    ret2,th=cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

    titles = ['Otsu\'s Thresholding']

    images = [th]

    for i in range(1):

        plt.subplot(1,1,i+1),plt.imshow(images[i],'gray')

        plt.title(titles[i])

        plt.xticks([]),plt.yticks([])

    plt.show()

Figure 2.9: Otsu's binarization using OpenCV

Now you are able to apply different thresholding transformations to any image.

Morphological Transformations

A morphological transformation consists of a set of simple image operations based on an image shape, and they are usually used on binary images. They are commonly used to differentiate text from the background or any other shapes. They need two inputs, one being the original image, and the other is called the structuring element or kernel, which decides the nature of the operation. The kernel is usually a matrix that slides through the image, multiplying its values by the values of the pixels of the image. Two basic morphological operators are erosion and dilation. Their variant forms are opening and closing. The one that should be used depends on the task at hand:

  • Erosion: When given a binary image, it shrinks the thickness by one pixel both on the interior and the exterior of the image, which is represented by white pixels. This method can be applied several times. It can be used for different reasons, depending on what you want to achieve, but normally it is used with dilation (which is explained in figure 2.10) in order to get rid of holes or noise. An example of erosion is shown here with the same digit, 3:

Figure 2.10: Example of erosion

  • Dilation: This method does the opposite of erosion. It increases the thickness of the object in a binary image by one pixel both on the interior and the exterior. It can also be applied to an image several times. This method can be used for different reasons, depending on what you want to achieve, but normally it is implemented along with erosion in order to get rid of holes in an image or noise. An example of dilation is shown here (we have implemented dilation on the image several times):

Figure 2.11: Example of dilation

  • Opening: This method performs erosion first, followed by dilation, and it is usually used for removing noise from an image.
  • Closing: This algorithm does the opposite of opening, as it performs dilation first before erosion. It is usually used for removing holes within an object:

Figure 2.12: Examples of opening and closing

As you can see, the opening method removes random noise from the image and the closing method works perfectly in fixing the small random holes within the image. In order to get rid of the holes of the output image from the opening method, a closing method could be applied.

There are more binary operations, but these are the basic ones.

Exercise 5: Applying the Various Morphological Transformations to an Image

In this exercise, we will be loading an image of a number, on which we will apply the morphological transformations that we have just learned about:

  1. Open up your Google Colab interface.
  2. Set the path of the directory:

    cd /content/drive/My Drive/C13550/Lesson02/Exercise05/

    Note

    The path mentioned in step 2 may change, as per your folder setup on Google Drive.

  3. Import the OpenCV, Matplotlib, and NumPy libraries. NumPy here is the fundamental package for scientific computing with Python and will help us create the kernels applied:

    import cv2

    import numpy as np

    from matplotlib import pyplot as plt

  4. Now type the code to load the Dataset/three.png image, which we are going to process in grayscale using OpenCV and show using Matplotlib:

    Note

    The three.png image can be found on GitHub in the Lesson02 | Exercise05 folder.

    img = cv2.imread('Dataset/three.png',0)

    plt.imshow(img,cmap='gray')

    plt.xticks([]),plt.yticks([])

    plt.savefig('ex2_1.jpg', bbox_inches='tight')

    plt.show()

    Figure 2.13: Result of plotting the loaded image

  5. Let's apply erosion by using OpenCV methods.

    The method used here is cv2.erode, and it takes three parameters: the image, a kernel that slides through the image, and the number of iterations, which is the number of times that it is executed:

    kernel = np.ones((2,2),np.uint8)

    erosion = cv2.erode(img,kernel,iterations = 1)

    plt.imshow(erosion,cmap='gray')

    plt.xticks([]),plt.yticks([])

    plt.savefig('ex2_2.jpg', bbox_inches='tight')

    plt.show()

    Figure 2.14: Output of the erosion method using OpenCV

    As we can see, the thickness of the figure has decreased.

  6. We are going to do the same with dilation.

    The method used here is cv2.dilate, and it takes three parameters: the image, the kernel, and the number of iterations:

    kernel = np.ones((2,2),np.uint8)

    dilation = cv2.dilate(img,kernel,iterations = 1)

    plt.imshow(dilation,cmap='gray')

    plt.xticks([]),plt.yticks([])

    plt.savefig('ex2_3.jpg', bbox_inches='tight')

    plt.show()

    Figure 2.15: Output of the dilation method using OpenCV

    As we can see, the thickness of the figure has increased.

  7. Finally, let's put opening and closing into practice.

    The method used here is cv2.morphologyEx, and it takes three parameters: the image, the method applied, and the kernel:

    import random

    random.seed(42)

    def sp_noise(image,prob):

        '''

        Add salt and pepper noise to image

        prob: Probability of the noise

        '''

        output = np.zeros(image.shape,np.uint8)

        thres = 1 - prob

        for i in range(image.shape[0]):

            for j in range(image.shape[1]):

                rdn = random.random()

                if rdn < prob:

                    output[i][j] = 0

                elif rdn > thres:

                    output[i][j] = 255

                else:

                    output[i][j] = image[i][j]

        return output

    def sp_noise_on_figure(image,prob):

        '''

        Add salt and pepper noise to image

        prob: Probability of the noise

        '''

        output = np.zeros(image.shape,np.uint8)

        thres = 1 - prob

        for i in range(image.shape[0]):

            for j in range(image.shape[1]):

                rdn = random.random()

                if rdn < prob:

                    if image[i][j] > 100:

                        output[i][j] = 0

                else:

                    output[i][j] = image[i][j]

        return output

    kernel = np.ones((2,2),np.uint8)

    # Create thicker figure to work with

    dilation = cv2.dilate(img, kernel, iterations = 1)

    # Create noisy image

    noise_img = sp_noise(dilation,0.05)

    # Create image with noise in the figure

    noise_img_on_image = sp_noise_on_figure(dilation,0.15)

    # Apply Opening to image with normal noise

    opening = cv2.morphologyEx(noise_img, cv2.MORPH_OPEN, kernel)

    # Apply Closing to image with noise in the figure

    closing = cv2.morphologyEx(noise_img_on_image, cv2.MORPH_CLOSE, kernel)

    images = [noise_img,opening,noise_img_on_image,closing]

    for i in range(4):

        plt.subplot(1,4,i+1),plt.imshow(images[i],'gray')

        plt.xticks([]),plt.yticks([])

    plt.savefig('ex2_4.jpg', bbox_inches='tight')

    plt.show()

Figure 2.16: Output of the opening method (left) and closing method (right) using OpenCV

Note

The entire code file can be found on GitHub in the Lesson02 | Exercise05 folder.

Blurring (Smoothing)

Image blurring performs convolution over an image with a filter kernel, which in simpler terms is multiplying a matrix of specific values on every part of the image, in order to smooth it. It is useful for removing noise and edges:

  • Averaging: In this method, we consider a box filter or kernel that takes the average of the pixels within the area of the kernel, replacing the central element by using convolution over the entire image.
  • Gaussian Blurring: The kernel applied here is Gaussian, instead of the box filter. It is used for removing Gaussian noise in a particular image.
  • Median Blurring: Similar to averaging, but this one replaces the central element with the median value of the pixels of the kernel. It actually has a very good effect on salt-and-pepper noise (that is, visible black or white spots in an image).

In Figure 2.17, we have applied the aforementioned methods:

Figure 2.17: Result of comparing different blurring methods

There are many more algorithms that could be applied, but these are the most important ones.

Exercise 6: Applying the Various Blurring Methods to an Image

In this exercise, we will be loading an image of a subway, to which we will apply the blurring method:

  1. Open up your Google Colab interface.
  2. Set the path of the directory:

    cd /content/drive/My Drive/C13550/Lesson02/Exercise06/

    Note

    The path mentioned in step 2 may be different according to your folder setup on Google Drive.

  3. Import the OpenCV, Matplotlib, and NumPy libraries:

    import cv2

    from matplotlib import pyplot as plt

    import numpy as np

  4. Type the code to load the Dataset/subway.png image that we are going to process in grayscale using OpenCV and show it using Matplotlib:

    Note

    The subway.png image can be found on GitHub in the Lesson02 | Exercise06 folder.

    img = cv2.imread('Dataset/subway.jpg')

    #Method to convert the image to RGB

    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    plt.imshow(img)

    plt.savefig('ex3_1.jpg', bbox_inches='tight')

    plt.xticks([]),plt.yticks([])

    plt.show()

    Figure 2.18: Result of plotting the loaded subway image in RGB

  5. Let's apply all the blurring methods:

    The methods applied are cv2.blur, cv2.GaussianBlur, and cv2.medianBlur. All of them take an image as the first parameter. The first method takes only one argument, that is, the kernel. The second method takes the kernel and the standard deviation (sigmaX and sigmaY), and if both are given as zeros, they are calculated from the kernel size. The method mentioned last only takes one more argument, which is the kernel size:

    blur = cv2.blur(img,(51,51)) # Apply normal Blurring

    blurG = cv2.GaussianBlur(img,(51,51),0) # Gaussian Blurring

    median = cv2.medianBlur(img,51) # Median Blurring

    titles = ['Original Image','Averaging', 'Gaussian Blurring', 'Median Blurring']

    images = [img, blur, blurG, median]

    for i in range(4):

        plt.subplot(2,2,i+1),plt.imshow(images[i])

        plt.title(titles[i])

        plt.xticks([]),plt.yticks([])

    plt.savefig('ex3_2.jpg', bbox_inches='tight')

    plt.show()

Figure 2.19: Blurring methods with OpenCV

Now you know how to apply several blurring techniques to any image.

Exercise 7: Loading an Image and Applying the Learned Methods

In this exercise, we will be loading an image of a number and we will apply the methods that we have learned so far.

Note

The entire code is available on GitHub in the Lesson02 | Exercise07-09 folder.

  1. Open up a new Google Colab interface, and mount your drive as mentioned in Exercise 4, Applying the Various Thresholds to an Image, of this chapter.
  2. Set the path of the directory:

    cd /content/drive/My Drive/C13550/Lesson02/Exercise07/

    Note

    The path mentioned in step 2 may be different according to your folder setup on Google Drive.

  3. Import the corresponding dependencies: NumPy, OpenCV, and Matplotlib:

    import numpy as np #Numpy

    import cv2 #OpenCV

    from matplotlib import pyplot as plt #Matplotlib

    count = 0

  4. Type the code to load the Dataset/number.jpg image, which we are going to process in grayscale using OpenCV and show using Matplotlib:

    Note

    The number.jpg image can be found on GitHub in the Lesson02 | Exercise07-09 | Dataset folder.

    img = cv2.imread('Dataset/number.jpg',0)

    plt.imshow(img,cmap='gray')

    plt.xticks([]),plt.yticks([])

    plt.show()

    Figure 2.20: Result of loading the image with the number

  5. If you want to recognize those digits using machine learning or any other algorithm, you need to simplify the visualization of them. Using thresholding seems to be the first logical step to proceed with this exercise. We have learned some thresholding methods, but the most commonly used one is Otsu's binarization, as it automatically calculates the threshold value without the user providing the details manually.

    Apply Otsu's binarization to the grayscale image and show it using Matplotlib:

    _,th1=cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU

    th1 = (255-th1)

    # This step changes the black with white and vice versa in order to have white figures

    plt.imshow(th1,cmap='gray')

    plt.xticks([]),plt.yticks([])

    plt.show()

    Figure 2.21: Using Otsu's binarization thresholding on the image

  6. In order to get rid of the lines in the background, we need to do some morphological transformations. First, start by applying the closing method:

    open1 = cv2.morphologyEx(th1, cv2.MORPH_OPEN, np.ones((4, 4),np.uint8))

    plt.imshow(open1,cmap='gray')

    plt.xticks([]),plt.yticks([])

    plt.show()

    Figure 2.22: Applying the closing method

    Note

    The lines in the background have been removed completely. Now a number prediction will be much easier.

  7. In order to fill the holes that are visible in these digits, we need to apply the opening method. Apply the opening method to the preceding image:

    close1 = cv2.morphologyEx(open1, cv2.MORPH_CLOSE, np.ones((8, 8), np.uint8))

    plt.imshow(close1,cmap='gray')

    plt.xticks([]),plt.yticks([])

    plt.show()

    Figure 2.23: Applying the opening method

  8. There are still leftovers and imperfections around the digits. In order to remove these, a closing method with a bigger kernel would be the best choice. Now apply the corresponding method:

    open2 = cv2.morphologyEx(close1, cv2.MORPH_OPEN,np.ones((7,12),np.uint8))

    plt.imshow(open2,cmap='gray')

    plt.xticks([]),plt.yticks([])

    plt.show()

    Figure 2.24: Applying the closing method with a kernel of a bigger size

    Depending on the classifier that you use to predict the digits or the conditions of the given image, some other algorithms would be applied.

  9. If you want to predict the numbers, you will need to predict them one by one. Thus, you should divide the numbers into smaller numbers.

    Thankfully, OpenCV has a method to do this, and it's called cv2.findContours. In order to find contours, we need to invert blacks into whites. This piece of code is larger, but it is only required if you want to predict character by character:

    _, contours, _ = cv2.findContours(open2, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) #Find contours

    cntsSorted = sorted(contours, key=lambda x: cv2.contourArea(x), reverse=True) #Sort the contours

    cntsLength = len(cntsSorted)

    images = []

    for idx in range(cntsLength): #Iterate over the contours

    x, y, w, h = cv2.boundingRect(contour_no) #Get its position and size

    ... # Rest of the code in Github

    images.append([x,sample_no]) #Add the image to the list of images and the X position

    images = sorted(images, key=lambda x: x[0]) #Sort the list of images using the X position

    {…}

    Note

    The entire code with added comments is available on GitHub in the Lesson02 | Exercise07-09 folder.

Figure 2.25: Extracted digits as the output

In the first part of the code, we are finding the contours of the image (the curve joining all the continuous points along the boundary and of the same color or intensity) to find every digit, which we then sort depending on the area of each contour (each digit).

After this, we loop over the contours, cropping the original image with the given contours, ending up with every number in a different image.

After this, we need to have all the images with the same shape, so we adapt the image to a given shape using NumPy and append the image to a list of images along with the X position.

Finally, we sort the list of images using the X position (from left to right, so they remain in order) and plot the results. We also save every single digit as an image so that we can use every digit separately afterward for any task we want.

Congratulations! You have successfully processed an image with text in it, obtained the text, and extracted every single character, and now the magic of machine learning can begin.