Introduction

Power of Computer Vision – It would be no exaggeration to claim that computer vision is one of the most revolutionary technologies of the 21st century. From medical imaging to precision agriculture and even self-driving cars, computer vision has enabled the automation of a vast array of tasks that were once thought to have to be exclusively carried out by human beings.

In this article, we will see the applications of computer vision and how they have transformed entire industries to help you find some fresh ideas for your business.

Origins of Computer Vision

Computer vision is a field that deals with how computers can process and derive meaningful information from images and videos. Before the dawn of Machine Learning and Artificial Intelligence, Digital Image Processing algorithms were used for image denoising and Facial Recognition. AI emerged as a scientific discipline in the 1960s with scientists of the time aiming to create computers that could mimic human vision. In less than 6 decades, computers can now detect objects with great accuracy, recognize faces and characters and even drive cars autonomously.

Now let’s take a look at which problems this technology solves and how it can be used across industries.

Leap Into the World of Computer Vision

Power of Computer Vision – Object detection is one of the applications of computer vision in which the computer is tasked with identifying and labeling all items within an image. The items could be objects such as cars, animals, human beings, galaxy clusters, or anything the model has been trained to detect. After learning the defining traits of a particular item from several thousand training images, the computer can predict with confidence that what it sees is an instance of that item.

One of the state-of-the-art object detection models is called YOLO v5 (You Only Look Once). YOLO is a family of object detection architectures and frameworks. It works by dividing an image into grids, whereby each grid detects objects within it.

One of the most impactful business applications of object detection is in the emerging technology of Precision Agriculture. Precision Agriculture seeks to increase farm yield while using fewer resources like water, herbicides and insecticides. It has been enabled by low-cost and easy-to-fly agricultural drones equipped with multispectral or RBG cameras. The images captured by these cameras can then be fed to an object detection model like YOLO, which can be trained to differentiate between crops and weed plants in the field.

The French startup XSun works like this. The company created SolarXOne, a solar-powered autonomous aircraft. Their HD images provide the farmer with an accurate view of critical farmland.

Besides agriculture, computer vision is also applied in autonomous vehicle development, medical image processing, military applications, etc.

Facial recognition

Power of Computer Vision – Facial recognition is a way of identifying an individual’s identity using his/her face. By reducing authentication vulnerabilities of systems, facial recognition technology is widely used for security and law enforcement.

Advances in Deep Learning are the engines behind the success of facial recognition technology. One of the modern facial recognition models is the Neural Network model developed by Google called FaceNet. FaceNet works by taking an image as input and outputting a vector of 128 numbers called embeddings. These embeddings represent the most important features of a person’s face. Images with similar embeddings represent the same person. FaceNet was able to achieve 95.12% accuracy in the Youtube face dataset.

One of the most impressive uses of facial recognition technology is in preventing retail crime. Facial recognition is used to identify known shoplifters or people with a known history of retail crime when they enter a retail store. According to some studies, facial recognition reduces violent incidents in retail stores by as much as 91%.

In addition to security, facial recognition technology is used for employee tracking, patient screening procedures in healthcare and more.

Optical character recognition (OCR)

Power of Computer Vision – Optical Character Recognition (OCR) refers to the process of converting scanned images containing handwritten or printed text into machine encoded text. OCR enables documents like passports and IDs, forms and even entire books to be scanned and converted into machine-readable text.

OCR has three distinct stages.

Image Preprocessing

To increase the odds of successful character recognition, a series of operations are performed on the scanned images. Some of them are listed below

  • De-skew: an image that was not aligned correctly when scanned may need to be rotated a few degrees clockwise or counter-clockwise
  • Binarisation: conversion of an image to black and white (also known as a binary image)
  • Layout analysis: identification of columns, paragraphs, captions. Very important for multi-column layouts like newspapers.

Text Recognition

There are various approaches to OCR. Software such as Cuneiform and Tesseract use a two-pass approach to character recognition. In the first pass, an attempt is made to recognize each word in turn. Each word that is satisfactory is passed to an adaptive classifier as training data. The adaptive classifier is then used to recognize the remaining words on the second pass.

A new technique known as iterative OCR automatically crops a document into sections based on page layout. OCR is performed on each section individually so as to maximize page-level OCR.

OCR technology has been one of the main fintech trends for years. According to ICC Global Survey 2020, 28% of banks, including global giants such as HSBC and Standard Chartered, are using OCR for data extraction and creating searchable documents.

The most common types of documents that banks often have to deal with, such as identity documents, contracts, receipts, financial statements, have a large volume of paper documents that need to be digitized. Banks use OCR to scan these documents, and then analysts and employees in relevant departments can access and look up information easily.

Human Pose Estimation

Human Pose Estimation refers to the task of predicting the poses of body parts such as joints in images or videos. In Computer Vision lingo, pose means an object’s position and orientation relative to some coordinate frame. Understanding human pose allows computers to have a more comprehensive understanding of human motion. This allows for applications in the areas of computer-guided personal fitness, physical therapy, entertainment and robotics.

Pose estimation occurs in two distinct phases.

  1. An input RGB image is fed through a convolutional neural network architecture. The neural network outputs a set of heatmaps, each of which correspond to a key body part.
  2. A special multi-pose decoding algorithm is used to detect poses and pose confidence scores.

By far the most popular pose estimation library is Google’s PoseNet. PoseNet is a machine learning model for real-time pose estimation in the browser. PoseNet can be used to estimate either a single pose or multiple poses.

Human Pose estimation was once a big challenge as it requires monumental computational and hardware technology support. However, today we have cloud services to support such requirements.   There are various types of cloud computing (Private, Public, Hybrid etc.) with the flexibility, scalability and the computational power that the cloud can provide.

Numerous startups are exploring the idea of using pose estimation algorithms to develop personal fitness apps. These apps can guide the users on how to perform certain exercises, thereby minimizing the risk of injury as well as the cost of having a one-on-one personal fitness trainer.

Another application of pose-estimation is in the advancing industry of self-driving cars. Using their understanding of a pedestrian’s pose, autonomous cars can decide whether or not to apply brakes to prevent an accident.

Human pose estimation is also one of the metaverse technologies. Using human pose estimation technology, users can synchronize their motion to a chosen avatar and dive into the digital world.

Wrapping up

Power of Computer Vision – In this article, we have seen the state-of-the-art of Computer Vision as well as some of its applications in various industries. But the field is still in its nascency and as better algorithms are developed and breakthroughs continue to be made, the applications are likely to explode.

Using ever-cheaper hardware, powerful vision algorithms will continue to deeply impact our lives. However, remember that before computer vision becomes mainstream, you have a chance to use this technology as a competitive advantage and stay ahead of the game.

By Evgeniy Krasnokutsky,

AI/ML Team Leader at MobiDev, PhD