What is computer vision? AI for images and video

Computer vision identifies and often locates objects in digital images and videos. Since living organisms process images with their visual cortex, many researchers have taken the architecture of the mammalian visual cortex as a model for neural networks designed to perform image recognition. The biological research goes back to the 1950s.

The progress in computer vision over the last 20 years has been absolutely remarkable. While not yet perfect, some computer vision systems achieve 99% accuracy, and others run decently on mobile devices.

The breakthrough in the neural network field for vision was Yann LeCun’s 1998 LeNet-5, a seven-level convolutional neural network for recognition of handwritten digits digitized in 32×32 pixel images. To analyze higher-resolution images, the LeNet-5 network would need to be expanded to more neurons and more layers.

Today’s best image classification models can identify diverse catalogs of objects at HD resolution in color. In addition to pure deep neural networks (DNNs), people sometimes use hybrid vision models, which combine deep learning with classical machine-learning algorithms that perform specific sub-tasks.

Other vision problems besides basic image classification have been solved with deep learning, including image classification with localization, object detection, object segmentation, image style transfer, image colorization, image reconstruction, image super-resolution, and image synthesis.

How does computer vision work?

Computer vision algorithms usually rely on convolutional neural networks, or CNNs. CNNs typically use convolutional, pooling, ReLU, fully connected, and loss layers to simulate a visual cortex.

The convolutional layer basically takes the integrals of many small overlapping regions. The pooling layer performs a form of non-linear down-sampling. ReLU layers apply the non-saturating activation function f(x) = max(0,x).

In a fully connected layer, the neurons have connections to all activations in the previous layer. A loss layer computes how the network training penalizes the deviation between the predicted and true labels, using a Softmax or cross-entropy loss for classification.

Copyright © 2020 IDG Communications, Inc.

Source Article