Blog

Musings of a Developer

Image processing using Python & Open-CV part-2

Hello and welcome again to another part of the OPEN-CV with Python tutorial series and in the previous part, we saw some applications of Computer vision as well as how to get started with Intel’s Open-CV library and perform some basic operations on image frames.

In this part of the tutorial, we are going to learn some more challenging functionalities that Open-CV has to offer to us.

Let’s start with the corner detection method. As the name suggests, this technique detects all the corners inside any given image, Duh!

#CORNER DETECTION
import numpy as np
import cv2
img = cv2.imread('heregoesyouramazingimage')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray = np.float32(gray)

corners_to_detect = 100
minimum_quality_score = 0.01
minimum_distance = 10

ST_corners = cv2.goodFeaturesToTrack(gray, corners_to_detect, minimum_quality_score, minimum_distance)
ST_corners = np.float32(corners)

for corner in corners:
    x,y = corner.ravel()
    cv2.circle(img,(x,y),3,255,-1)

cv2.imshow('ST_corners', ST_corners)

  1. We first load in the image, convert it to gray and then convert it to float32.
  2. We then define some parameters that we pass to our “goodFeaturesToTrack” function.
  3. These parameters include 1) number of corners we want to detect, 2) quality of the detected corners and 3) Minimum distance between each detected corners. Go ahead and tinker with these numbers.
  4. Next, we iterate through each corner and make a circle at each point that we think is a corner. Thus detecting all the corners in the image

This is not the only method available in the Open-CV library to detect corners. The other method to achieve similar results is the Harris Corner Detection method.

After getting to understand the concept of corner detection we now move on to learning what is foreground extraction. The idea of foreground extraction is to find the foreground, and remove the background. Same as it sounds. This is much like what a green screen does except here we will not require any green screen!

#FOREGROUND EXTRACTION
import numpy as np
import cv2
from matplotlib import pyplot as plt

img = cv2.imread('heregoesyouamazingimage')

mask = np.zeros(img.shape[:2],np.uint8)

background_model = np.zeros((1,65),np.float64)
foreground_model = np.zeros((1,65),np.float64)

rectangle = (161,79,150,150)

cv2.grabCut(img,mask,rectangle,background_model,foreground_model,5,cv2.GC_INIT_WITH_RECT)

mask2 = np.where((mask==2)|(mask==0),0,1).astype('uint8')

img = img*mask2[:,:,np.newaxis]

plt.imshow(img)
plt.colorbar()
plt.show()

Breaking down the code bit by bit we get:

  1. Importing all the necessary libraries.
  2. loading in the image.
  3. creating a mask of a specific shape.
  4. specifying the background and foreground models.
  5. The real important part is defining the rectangle. Here the rectangle is (start_x, start_y, width, height).
  6. We then call in our main ‘grabCut’ function from the Open-CV library and pass in all the necessary parameters. You can head towards the official docs if you want to tinker around with the parameters.
  7. Lastly, we multiply with the input image, and we get our final result

Note: Find the proper coordinates for your image.


Foreground extraction was cool, right? Something similar to that is the Background reduction. It is nothing but reducing the background to the minimum by detecting motion. This is going to require us to use a video, or to have two images (One, with the absence of objects you want to track, and another with the presence of those same objects).

#BACKGROUND REDUCTION
import numpy as np
import cv2

cap = cv2.VideoCapture(0)
foreground_model = cv2.createBackgroundSubtractorMOG2()

while(1):
    ret, frame = cap.read()
    foregroundmask = foreground_model.apply(frame)
    cv2.imshow('frame',frame)
    cv2.imshow('foreground',foregroundmask)

if cv2.waitKey(20) and 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

The code to do this is pretty straightforward and easy to understand. All we do is invoke the inbuilt ‘createBackgroundSubtractorMOG2()’ function and pass in the frame, of which we want to reduce the background.

Morphological transformations are simple operations that can be performed on an image based on its shape. It needs two inputs, one is our original image, the second one is called kernel which decides the nature of the operation. Two basic morphological operators are Erosion and Dilation. Then its variant forms like Opening, Closing etc also comes into play. We will see them one-by-one.

#MORPHOLOGICAL TRANSFORMATION
import cv2
import numpy as np

cap = cv2.VideoCapture(0)

while True:
    _, frame = cap.read()
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    color_value_1 = np.arrays([0, 0, 0])
    color_value_2 = np.arrays([255, 255, 255])
    mask = cv2.inRange(hsv, color_value_1, color_value_2)
    result = cv2.bitwise_and(frame, frame, mask = mask)
    kernel = np.ones((5,5), np.uint8)
    eroded_img = cv2.erode(mask, kernel, iterations = 1)
    dilated_img = cv2.dilate(mask, kernel, iterations = 1)
    opening = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
    closing = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)

    cv2.imshow('frame', frame)
    cv2.imshow('result', result)
    cv2.imshow('eroded_img', eroded_img)
    cv2.imshow('dilated_img', dilated_img)
    cv2.imshow('opening', opening)
    cv2.imshow('closing', closing)

    if waitKey(20) and 0xFF == ord('q'):
        break

cv2.destroyAllWindows()
cap.release()

Erosion is done by “eroding” the edges. We first create a slider and provide it with a window(5 x 5 pixels). Now as the slider slides on the image, if all of the pixels are white, then we get white, otherwise black. This helps in eliminating some of the white noise inside the image. Dilation, on the other hand, is doing the exact opposite of Erosion.

Next comes “opening” and “closing”. With Opening, all we do is remove “false positives”. The idea of “closing” is doing the exact opposite of “opening” i.e to remove “false negatives”.

well well well! Congratulations to you on reaching up till here. You now have a good deal of understanding and overview of how Open-CV works. The next topic(and the last topic of this post) is on recording videos. Yes, Recording videos through Open-CV might seem like a trivial task at first but unfortunately, it isn’t. So… Let’s dive in.

#RECORDING VIDEOS
import os
import numpy as numpy
import cv2

filename = 'video.avi'

frames_per_seconds = 24.0

my_res = '720p'

cap = VideoCapture(0)

def change_resolution(cap, width, height):
    cap.set(3, width)
    cap.set(4, height)

STD_DIM = {
"480p" : (640, 480),
"720p" : (1280, 720),
"1080p" : (1920, 1080),
"4k" : (3840, 2160),
}

def set_dimensions(cap, res = '1080p'):
    width, height = STD_DIM['480p']
    if res in STD_DIM:
    width, height = STD_DIM[res]
    change_resolution(cap, width, height)
    return width, height

'''
Video Encoding, might require additional installs
Types of Codes: http://www.fourcc.org/codecs.php
also OpenCV 2 doesn't support cv2.VideoWriter_fourcc. Instead use cv2.cv.CV_FOURCC(*'XVID')
'''

VIDEO_TYPE = {
'avi' : cv2.VideoWriter_fourcc(*'XVID')
'mp4' : cv2.VideoWriter_fourcc(*'XVID')
'mp4' : cv2.VideoWriter_fourcc(*'H264')
}

def set_videotype(filename):
    filename, ext = os.path.splitext(filename)
    if ext in VIDEO_TYPE:
        return VIDEO_TYPE[ext]
    return (VIDEO_TYPE['avi'])

dims = set_dimensions(cap, res = my_res)
video_type_cv2 = set_videotype(filename)

out = cv2.VideoWriter(filename, video_type_cv2, frames_per_seconds, dims)

while True:
    ret, frame = cap.read()
    out.write(frame)
    cv2.imshow('frame', frame)

if cv2.waitKey(20) and 0xFF == ord('q'):
        break

cap.release()
out.release()
cv2.destroyAllWindows()

Breaking down the code piece by piece we get:

  1. The first part of the code is defining all the necessary parameters i.e. resolution, file name, Rate of Frames Per Second(FPS) etc along with importing all the necessary libraries
  2. In the next part of the code, we define a dictionary containing all the possible resolutions with their respective heights and widths. Creating a function that sets those resolutions and defaulting it to 1080p.
  3. We define another dictionary that contains values for different video types and define a ‘set_videotype’ function that returns the type of video file format.
  4. Final move will be to invoke our main function(‘cv2.VideoWriter’) and pass in all the necessary parameters.
  5. Last but not least, we write the output of the ‘cv2.VideoWriter’ function to a new frame inside a loop.

That’ll be it for this blog post, folks !!! Congratulations for making it this far.

I have a Github repository which contains all of the above code in a very well commented structure. The repository also contains all of the resources that I have used to learn Open-CV.

Stay tuned. Until next time…!