Basic Algorithms in Computer Vision_Artificial Vision and Language Processing for Robotics-QQ阅读中文武侠网

书名：Artificial Vision and Language Processing for Robotics
作者名：?lvaro Morena Alberola Gonzalo Molina Gallego Unai Garay Maestre
本章字数：4046字
更新时间：2025-02-25 23:03:16

Basic Algorithms in Computer Vision

In this topic, we will be addressing how images are formed. We will introduce a library that is very useful for performing computer vision tasks and we will learn about the workings of some of these tasks and algorithms and how to code them.

Image Terminology

To understand computer vision, we first need to know how images work and how a computer interprets them.

A computer understands an image as a set of numbers grouped together. To be more specific, the image is seen as a two-dimensional array, a matrix that contains values from 0 to 255 (0 being for black and 255 for white in grayscale images) representing the values of the pixels of an image (pixel values), as shown in the following example:

Figure 2.1: Image representation without and with pixel values

In the image on the left-hand side, the number 3 is shown in a low resolution. On the right-hand side, the same image is shown along with the value of every pixel. As this value rises, a brighter color is shown, and if the value decreases, the color gets darker.

This particular image is in grayscale, which means it is only a two-dimensional array of values from 0 to 255, but what about colored images? Colored images (or red/green/blue (RGB) images) have three layers of two-dimensional arrays stacked together. Every layer represents one color each and putting them all together forms a colored image.

The preceding image has 14x14 pixels in its matrix. In grayscale, it is represented as 14x14x1, as it only has one matrix, and one channel. For the RGB format, the representation is 14x14x3 as it has 3 channels. From this, all that computers need to understand is that the images come from these pixels.

OpenCV

OpenCV is an open source computer vision library that has C++, Python, and Java interfaces and supports Windows, Linux, macOS, iOS, and Android.

For all the algorithms mentioned in this chapter, we will be using OpenCV. OpenCV helps us perform these algorithms using Python. If you want to practice one of these algorithms, we recommend using Google Colab. You will need to install Python 3.5 or above, OpenCV, and NumPy to carry on with this chapter. To display them on our screens, we will use Matplotlib. Both of these are great libraries for AI.

Basic Image Processing Algorithms

In order for a computer to understand an image, the image has to be processed first. There are many algorithms that can be used to process images and the output depends on the task at hand.

Some of the most basic algorithms are:

Thresholding
Morphological transformations
Blurring

Thresholding

Thresholding is commonly used to simplify how an image is visualized by both the computer and the user in order to make analysis easier. It is based on a value that the user sets and every pixel is converted to white or black depending on whether the value of every pixel is higher or lower than the set value. If the image is in grayscale, the output image will be white and black, but if you choose to keep the RGB format for your image, the threshold will be applied for every channel, which means it will still output a colored image.

There are different methods for thresholding, and these are some of the most used ones:

Simple Thresholding: If the pixel value is lower than the threshold set by the user, this pixel will be assigned a 0 value (black), or 255 (white). There are also different styles of thresholding within simple thresholding:
Threshold binary
Threshold binary inverted
Truncate
Threshold to zero
Threshold to zero inverted
The different types of thresholds are shown in figure 2.2

Figure 2.2: Different types of thresholds
Threshold binary inverted works like binary but the pixels that were black are white and vice versa. Global thresholding is another name given to binary thresholding under simple thresholding.
Truncate shows the exact value of the threshold if the pixel is above the threshold and the pixel value.
Threshold to zero outputs the pixel value (which is the actual value of the pixel) if the pixel value is above the threshold value, otherwise it will output a black image, whereas threshold to zero inverted does the exact opposite.
Note
The threshold value can be modified depending on the image or what the user wants to achieve.
Adaptive Thresholding: Simple thresholding uses a global value as the threshold. If the image has different lighting conditions in some parts, the algorithm does not perform that well. In such cases, adaptive thresholding automatically guesses different threshold values for different regions within the image, giving us a better overall result with varying lighting conditions.
There are two types of adaptive thresholding:
Adaptive mean thresholding
Adaptive Gaussian thresholding
The difference between the adaptive thresholding and simple thresholding is shown in figure 2.3

Figure 2.3: Difference between adaptive thresholding and simple thresholding
In adaptive mean thresholding, the threshold value is the mean of the neighborhood area, while in adaptive Gaussian thresholding, the threshold value is the weighted sum of the neighborhood values where weights are a Gaussian window.
Otsu's Binarization: In global thresholding, we used an arbitrary value to assign a threshold value. Consider a bimodal image (an image where the pixels are distributed over two dominant regions). How would you choose the correct value? Otsu's binarization automatically calculates a threshold value from the image histogram for a bimodal image. An image histogram is a type of histogram that acts as a graphical representation of the tonal distribution in a digital image:

Figure 2.4: Otsu's thresholding

Exercise 4: Applying Various Thresholds to an Image

NOTE

As we are training artificial neural networks on Google Colab, we should use the GPU that Google Colab provides us. In order to do that, we would have to go to runtime > Change runtime type > Hardware accelerator: GPU > Save.

All the exercises and activities will be primarily developed in Google Colab. It is recommended to keep a separate folder for different assignments, unless advised not to.

The Dataset folder is available on GitHub in the Lesson02 | Activity02 folder.

In this exercise, we will be loading an image of a subway, to which we will apply thresholding:

Open up your Google Colab interface.
Create a folder for the book, download the Dataset folder from GitHub, and upload it in the folder.
Import the drive and mount it as follows:
from google.colab import drive
drive.mount('/content/drive')
Note
Every time you use a new collaborator, mount the drive to the desired folder.
Once you have mounted your drive for the first time, you will have to enter the authorization code that you would get by clicking on the URL given by Google and pressing the Enter key on your keyboard:

Figure 2.5: Image displaying the Google Colab authorization step
Now that you have mounted the drive, you need to set the path of the directory:
cd /content/drive/My Drive/C13550/Lesson02/Exercise04/
Note
The path mentioned in step 5 may change as per your folder setup on Google Drive. The path will always begin with cd /content/drive/My Drive/.
The Dataset folder must be present in the path you are setting up.
Now you need to import the corresponding dependencies: OpenCV cv2 and Matplotlib:
import cv2
from matplotlib import pyplot as plt
Now type the code to load the subway.jpg image, which we are going to process in grayscale using OpenCV and show using Matplotlib:
Note
The subway.jpg image can be found on GitHub in the Lesson02 | Exercise04 folder.
img = cv2.imread('subway.jpg',0)
plt.imshow(img,cmap='gray')
plt.xticks([]),plt.yticks([])
plt.show()

Figure 2.6: Result of plotting the loaded subway image
Let's apply simple thresholding by using OpenCV methods.
The method for doing so in OpenCV is called cv2.threshold and it takes three parameters: image (grayscale), threshold value (used to classify the pixel values), and maxVal, which represents the value to be given if the pixel value is more than (sometimes less than) the threshold value:
_,thresh1 = cv2.threshold(img,107,255,cv2.THRESH_BINARY)
_,thresh2 = cv2.threshold(img,107,255,cv2.THRESH_BINARY_INV)
_,thresh3 = cv2.threshold(img,107,255,cv2.THRESH_TRUNC)
_,thresh4 = cv2.threshold(img,107,255,cv2.THRESH_TOZERO)
_,thresh5 = cv2.threshold(img,107,255,cv2.THRESH_TOZERO_INV)
titles = ['Original Image','BINARY', 'BINARY_INV', 'TRUNC','TOZERO','TOZERO_INV']
images = [img, thresh1, thresh2, thresh3, thresh4, thresh5]
for i in range(6):
    plt.subplot(2,3,i+1),plt.imshow(images[i],'gray')
    plt.title(titles[i])
    plt.xticks([]),plt.yticks([])
plt.show()

Figure 2.7: Simple thresholding using OpenCV
We are going to do the same with adaptive thresholding.
The method for doing so is cv2.adaptiveThreshold and it has three special input parameters and only one output argument. Adaptive method, block size (the size of the neighborhood area), and C (a constant that is subtracted from the mean or weighted mean calculated) are the inputs, whereas you only obtain the thresholded image as the output. This is unlike global thresholding, where there are two outputs:
th2=cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY,71,7)
th3=cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,71,7)
titles = ['Adaptive Mean Thresholding', 'Adaptive Gaussian Thresholding']
images = [th2, th3]
for i in range(2):
    plt.subplot(1,2,i+1),plt.imshow(images[i],'gray')
    plt.title(titles[i])
    plt.xticks([]),plt.yticks([])
plt.show()

Figure 2.8: Adaptive thresholding using OpenCV
Finally, let's put Otsu's binarization into practice.
The method is the same as for simple thresholding, cv2.threshold, but with an extra flag, cv2.THRESH_OTU:
ret2,th=cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
titles = ['Otsu\'s Thresholding']
images = [th]
for i in range(1):
    plt.subplot(1,1,i+1),plt.imshow(images[i],'gray')
    plt.title(titles[i])
    plt.xticks([]),plt.yticks([])
plt.show()

Figure 2.9: Otsu's binarization using OpenCV

Now you are able to apply different thresholding transformations to any image.

Morphological Transformations

A morphological transformation consists of a set of simple image operations based on an image shape, and they are usually used on binary images. They are commonly used to differentiate text from the background or any other shapes. They need two inputs, one being the original image, and the other is called the structuring element or kernel, which decides the nature of the operation. The kernel is usually a matrix that slides through the image, multiplying its values by the values of the pixels of the image. Two basic morphological operators are erosion and dilation. Their variant forms are opening and closing. The one that should be used depends on the task at hand:

Erosion: When given a binary image, it shrinks the thickness by one pixel both on the interior and the exterior of the image, which is represented by white pixels. This method can be applied several times. It can be used for different reasons, depending on what you want to achieve, but normally it is used with dilation (which is explained in figure 2.10) in order to get rid of holes or noise. An example of erosion is shown here with the same digit, 3:

Figure 2.10: Example of erosion

Dilation: This method does the opposite of erosion. It increases the thickness of the object in a binary image by one pixel both on the interior and the exterior. It can also be applied to an image several times. This method can be used for different reasons, depending on what you want to achieve, but normally it is implemented along with erosion in order to get rid of holes in an image or noise. An example of dilation is shown here (we have implemented dilation on the image several times):

Figure 2.11: Example of dilation

Opening: This method performs erosion first, followed by dilation, and it is usually used for removing noise from an image.
Closing: This algorithm does the opposite of opening, as it performs dilation first before erosion. It is usually used for removing holes within an object:

Figure 2.12: Examples of opening and closing

As you can see, the opening method removes random noise from the image and the closing method works perfectly in fixing the small random holes within the image. In order to get rid of the holes of the output image from the opening method, a closing method could be applied.

There are more binary operations, but these are the basic ones.

Exercise 5: Applying the Various Morphological Transformations to an Image

In this exercise, we will be loading an image of a number, on which we will apply the morphological transformations that we have just learned about:

Open up your Google Colab interface.
Set the path of the directory:
cd /content/drive/My Drive/C13550/Lesson02/Exercise05/
Note
The path mentioned in step 2 may change, as per your folder setup on Google Drive.
Import the OpenCV, Matplotlib, and NumPy libraries. NumPy here is the fundamental package for scientific computing with Python and will help us create the kernels applied:
import cv2
import numpy as np
from matplotlib import pyplot as plt
Now type the code to load the Dataset/three.png image, which we are going to process in grayscale using OpenCV and show using Matplotlib:
Note
The three.png image can be found on GitHub in the Lesson02 | Exercise05 folder.
img = cv2.imread('Dataset/three.png',0)
plt.imshow(img,cmap='gray')
plt.xticks([]),plt.yticks([])
plt.savefig('ex2_1.jpg', bbox_inches='tight')
plt.show()

Figure 2.13: Result of plotting the loaded image
Let's apply erosion by using OpenCV methods.
The method used here is cv2.erode, and it takes three parameters: the image, a kernel that slides through the image, and the number of iterations, which is the number of times that it is executed:
kernel = np.ones((2,2),np.uint8)
erosion = cv2.erode(img,kernel,iterations = 1)
plt.imshow(erosion,cmap='gray')
plt.xticks([]),plt.yticks([])
plt.savefig('ex2_2.jpg', bbox_inches='tight')
plt.show()

Figure 2.14: Output of the erosion method using OpenCV
As we can see, the thickness of the figure has decreased.
We are going to do the same with dilation.
The method used here is cv2.dilate, and it takes three parameters: the image, the kernel, and the number of iterations:
kernel = np.ones((2,2),np.uint8)
dilation = cv2.dilate(img,kernel,iterations = 1)
plt.imshow(dilation,cmap='gray')
plt.xticks([]),plt.yticks([])
plt.savefig('ex2_3.jpg', bbox_inches='tight')
plt.show()

Figure 2.15: Output of the dilation method using OpenCV
As we can see, the thickness of the figure has increased.
Finally, let's put opening and closing into practice.
The method used here is cv2.morphologyEx, and it takes three parameters: the image, the method applied, and the kernel:
import random
random.seed(42)
def sp_noise(image,prob):
    '''
    Add salt and pepper noise to image
    prob: Probability of the noise
    '''
    output = np.zeros(image.shape,np.uint8)
    thres = 1 - prob
    for i in range(image.shape[0]):
        for j in range(image.shape[1]):
            rdn = random.random()
            if rdn < prob:
                output[i][j] = 0
            elif rdn > thres:
                output[i][j] = 255
            else:
                output[i][j] = image[i][j]
    return output
def sp_noise_on_figure(image,prob):
    '''
    Add salt and pepper noise to image
    prob: Probability of the noise
    '''
    output = np.zeros(image.shape,np.uint8)
    thres = 1 - prob
    for i in range(image.shape[0]):
        for j in range(image.shape[1]):
            rdn = random.random()
            if rdn < prob:
                if image[i][j] > 100:
                    output[i][j] = 0
            else:
                output[i][j] = image[i][j]
    return output
kernel = np.ones((2,2),np.uint8)
# Create thicker figure to work with
dilation = cv2.dilate(img, kernel, iterations = 1)
# Create noisy image
noise_img = sp_noise(dilation,0.05)
# Create image with noise in the figure
noise_img_on_image = sp_noise_on_figure(dilation,0.15)
# Apply Opening to image with normal noise
opening = cv2.morphologyEx(noise_img, cv2.MORPH_OPEN, kernel)
# Apply Closing to image with noise in the figure
closing = cv2.morphologyEx(noise_img_on_image, cv2.MORPH_CLOSE, kernel)
images = [noise_img,opening,noise_img_on_image,closing]
for i in range(4):
    plt.subplot(1,4,i+1),plt.imshow(images[i],'gray')
    plt.xticks([]),plt.yticks([])
plt.savefig('ex2_4.jpg', bbox_inches='tight')
plt.show()

Figure 2.16: Output of the opening method (left) and closing method (right) using OpenCV

Note

The entire code file can be found on GitHub in the Lesson02 | Exercise05 folder.

Blurring (Smoothing)

Image blurring performs convolution over an image with a filter kernel, which in simpler terms is multiplying a matrix of specific values on every part of the image, in order to smooth it. It is useful for removing noise and edges:

Averaging: In this method, we consider a box filter or kernel that takes the average of the pixels within the area of the kernel, replacing the central element by using convolution over the entire image.
Gaussian Blurring: The kernel applied here is Gaussian, instead of the box filter. It is used for removing Gaussian noise in a particular image.
Median Blurring: Similar to averaging, but this one replaces the central element with the median value of the pixels of the kernel. It actually has a very good effect on salt-and-pepper noise (that is, visible black or white spots in an image).

In Figure 2.17, we have applied the aforementioned methods:

Figure 2.17: Result of comparing different blurring methods

There are many more algorithms that could be applied, but these are the most important ones.

Exercise 6: Applying the Various Blurring Methods to an Image

In this exercise, we will be loading an image of a subway, to which we will apply the blurring method:

Open up your Google Colab interface.
Set the path of the directory:
cd /content/drive/My Drive/C13550/Lesson02/Exercise06/
Note
The path mentioned in step 2 may be different according to your folder setup on Google Drive.
Import the OpenCV, Matplotlib, and NumPy libraries:
import cv2
from matplotlib import pyplot as plt
import numpy as np
Type the code to load the Dataset/subway.png image that we are going to process in grayscale using OpenCV and show it using Matplotlib:
Note
The subway.png image can be found on GitHub in the Lesson02 | Exercise06 folder.
img = cv2.imread('Dataset/subway.jpg')
#Method to convert the image to RGB
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img)
plt.savefig('ex3_1.jpg', bbox_inches='tight')
plt.xticks([]),plt.yticks([])
plt.show()

Figure 2.18: Result of plotting the loaded subway image in RGB
Let's apply all the blurring methods:
The methods applied are cv2.blur, cv2.GaussianBlur, and cv2.medianBlur. All of them take an image as the first parameter. The first method takes only one argument, that is, the kernel. The second method takes the kernel and the standard deviation (sigmaX and sigmaY), and if both are given as zeros, they are calculated from the kernel size. The method mentioned last only takes one more argument, which is the kernel size:
blur = cv2.blur(img,(51,51)) # Apply normal Blurring
blurG = cv2.GaussianBlur(img,(51,51),0) # Gaussian Blurring
median = cv2.medianBlur(img,51) # Median Blurring
titles = ['Original Image','Averaging', 'Gaussian Blurring', 'Median Blurring']
images = [img, blur, blurG, median]
for i in range(4):
    plt.subplot(2,2,i+1),plt.imshow(images[i])
    plt.title(titles[i])
    plt.xticks([]),plt.yticks([])
plt.savefig('ex3_2.jpg', bbox_inches='tight')
plt.show()

Figure 2.19: Blurring methods with OpenCV

Now you know how to apply several blurring techniques to any image.

Exercise 7: Loading an Image and Applying the Learned Methods

In this exercise, we will be loading an image of a number and we will apply the methods that we have learned so far.

Note

The entire code is available on GitHub in the Lesson02 | Exercise07-09 folder.

Open up a new Google Colab interface, and mount your drive as mentioned in Exercise 4, Applying the Various Thresholds to an Image, of this chapter.
Set the path of the directory:
cd /content/drive/My Drive/C13550/Lesson02/Exercise07/
Note
The path mentioned in step 2 may be different according to your folder setup on Google Drive.
Import the corresponding dependencies: NumPy, OpenCV, and Matplotlib:
import numpy as np #Numpy
import cv2 #OpenCV
from matplotlib import pyplot as plt #Matplotlib
count = 0
Type the code to load the Dataset/number.jpg image, which we are going to process in grayscale using OpenCV and show using Matplotlib:
Note
The number.jpg image can be found on GitHub in the Lesson02 | Exercise07-09 | Dataset folder.
img = cv2.imread('Dataset/number.jpg',0)
plt.imshow(img,cmap='gray')
plt.xticks([]),plt.yticks([])
plt.show()

Figure 2.20: Result of loading the image with the number
If you want to recognize those digits using machine learning or any other algorithm, you need to simplify the visualization of them. Using thresholding seems to be the first logical step to proceed with this exercise. We have learned some thresholding methods, but the most commonly used one is Otsu's binarization, as it automatically calculates the threshold value without the user providing the details manually.
Apply Otsu's binarization to the grayscale image and show it using Matplotlib:
_,th1=cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU
th1 = (255-th1)
# This step changes the black with white and vice versa in order to have white figures
plt.imshow(th1,cmap='gray')
plt.xticks([]),plt.yticks([])
plt.show()

Figure 2.21: Using Otsu's binarization thresholding on the image
In order to get rid of the lines in the background, we need to do some morphological transformations. First, start by applying the closing method:
open1 = cv2.morphologyEx(th1, cv2.MORPH_OPEN, np.ones((4, 4),np.uint8))
plt.imshow(open1,cmap='gray')
plt.xticks([]),plt.yticks([])
plt.show()

Figure 2.22: Applying the closing method
Note
The lines in the background have been removed completely. Now a number prediction will be much easier.
In order to fill the holes that are visible in these digits, we need to apply the opening method. Apply the opening method to the preceding image:
close1 = cv2.morphologyEx(open1, cv2.MORPH_CLOSE, np.ones((8, 8), np.uint8))
plt.imshow(close1,cmap='gray')
plt.xticks([]),plt.yticks([])
plt.show()

Figure 2.23: Applying the opening method
There are still leftovers and imperfections around the digits. In order to remove these, a closing method with a bigger kernel would be the best choice. Now apply the corresponding method:
open2 = cv2.morphologyEx(close1, cv2.MORPH_OPEN,np.ones((7,12),np.uint8))
plt.imshow(open2,cmap='gray')
plt.xticks([]),plt.yticks([])
plt.show()

Figure 2.24: Applying the closing method with a kernel of a bigger size
Depending on the classifier that you use to predict the digits or the conditions of the given image, some other algorithms would be applied.
If you want to predict the numbers, you will need to predict them one by one. Thus, you should divide the numbers into smaller numbers.
Thankfully, OpenCV has a method to do this, and it's called cv2.findContours. In order to find contours, we need to invert blacks into whites. This piece of code is larger, but it is only required if you want to predict character by character:
_, contours, _ = cv2.findContours(open2, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) #Find contours
cntsSorted = sorted(contours, key=lambda x: cv2.contourArea(x), reverse=True) #Sort the contours
cntsLength = len(cntsSorted)
images = []
for idx in range(cntsLength): #Iterate over the contours
x, y, w, h = cv2.boundingRect(contour_no) #Get its position and size
... # Rest of the code in Github
images.append([x,sample_no]) #Add the image to the list of images and the X position
images = sorted(images, key=lambda x: x[0]) #Sort the list of images using the X position
{…}
Note
The entire code with added comments is available on GitHub in the Lesson02 | Exercise07-09 folder.

Figure 2.25: Extracted digits as the output

In the first part of the code, we are finding the contours of the image (the curve joining all the continuous points along the boundary and of the same color or intensity) to find every digit, which we then sort depending on the area of each contour (each digit).

After this, we loop over the contours, cropping the original image with the given contours, ending up with every number in a different image.

After this, we need to have all the images with the same shape, so we adapt the image to a given shape using NumPy and append the image to a list of images along with the X position.

Finally, we sort the list of images using the X position (from left to right, so they remain in order) and plot the results. We also save every single digit as an image so that we can use every digit separately afterward for any task we want.

Congratulations! You have successfully processed an image with text in it, obtained the text, and extracted every single character, and now the magic of machine learning can begin.