Machine Learning and neural networks are expanding our understanding of data and the insights it holds. From a business standpoint, neural networks are engines of generating opportunities. They make sense of data and let you benefit from it.
Convolutional Neural Networks holds a special place in that regard. As a foundation of computer vision applications, the development and implementation of these complex systems show us
- how many different insights are behind visual content;
- how this information affects the quality of the service/product and overall customer satisfaction.
In this article, we will explain what CNN is, how it operates, and also look at its most prominent business applications.
Convolutional Neural Network is a type of artificial deep learning neural network primarily used in a variety of computer vision/image recognition operations. This process includes the following operations:
- Image recognition and OCR
- Object detection for self-driving cars
- Face recognition on social media
- Image analysis in healthcare
The term “convolutional” refers to a mathematical function derived by integration from two distinct functions. It is the process of rolling different elements together into a coherent whole by multiplying them. Convolution describes how the other function influences the shape of one function. In other words, it is all about the relations between elements and their operation as a whole.
The primary tasks of convolutional neural networks are the following:
- classify visual content (basically, describe what they “see”),
- recognize objects within is scenery (for example, eyes, nose, lips, ears on the face),
- group recognized objects into clusters (for example, eyes with eyes, noses with noses);
The other prominent application of CNNs is preparing the groundwork for different types of data analysis.
CNNs are used for Optical Character Recognition (OCR) to classify and cluster peculiar elements such as letters and numbers and subsequently put them together into a coherent whole. Also, CNN is applied to recognize and transcribe the spoken word.
The sentiment analysis operation uses the classification capabilities of CNN.
Now, let’s explain the mechanics behind the Convolutional Neural Network.
Convolutional Neural Network architecture consists of four layers:
- Convolutional layer - where the action starts. The convolutional layer is designed to identify the features of an image. Usually, it goes from the general (i.e., shapes) to specific (i.e., identifying elements of an object, the face of certain man, etc.).
- Then goes Rectified Linear Unit layer (aka ReLu). This layer is an extension of a convolutional layer. The purpose of ReLu is to increase the non-linearity of the image. It is the process of stripping an image of excessive fat to provide a better feature extraction.
- Pooling layer is designed to reduce the number of parameters of the input i.e., perform regression. In other words, it concentrates on the meaty parts of the received information.
- The fully connected layer is a standard feed-forward neural network. It is a final straight line stretch before the finish line where all the things are already apparent, and it is only a matter of time when the results are fully confirmed.
Let’s explain how CNN works in case of image recognition.
- CNN perceives an image as a volume, a three-dimensional object. Usually, digital color images contain Red-Blue-Green, aka RGB encoding. What it means is that convolutional networks understand images as three distinct channels of color stacked on top of each other.
- CNN groups pixels and processes them through a set of filters designed to get certain kinds of results (for example, to recognize geometrical shapes on an image). The number of filters applied usually depends on the complexity of an image and the purpose of recognition.
- Pooling layer is designed to reduce the number of parameters of the input, i.e., perform regression.
As a result, you can a recognized image with a set of identifying credentials and data layout that represents a blueprint of a picture of a specified kind.
Now let’s take a look at the most prominent business applications of CNNs.
Image recognition and classification in its various forms is the primary field of use for convolutional neural networks. It is also the one use case that involves the most progressive frameworks (especially, in the case of medical imaging).
The purpose of the CNN image classification is the following:
- Deconstruct an image and identify its distinct feature (the job of supervised machine learning classification algorithm)
- Reduce the description to its key credentials (this is the job of dimensionality reduction unsupervised machine learning algorithm)
The following fields are using this process:
- Image tagging algorithms are the most basic type of image classification. The image tag is a word or a word combination that describes the images and makes it easier to find. Search engines like Google, social networks like Facebook, and eCommerce marketplaces like Amazon are all using this technique. It is also one of the foundational elements of visual search. Tagging includes object recognition of content, and in more sophisticated cases, even sentiment analysis description of the tone of the picture.
- Visual Search – this technique involves matching an input image with the available database. In addition to text tags that provide basic navigation, visual search analyzes the image and looks for images with similar credentials. For example, this is how Google can find versions of the same image but in different sizes.
- Recommender engines is another field where image classification and object recognition are widely applied. For example, Amazon uses image recognition CNN to provide better suggestions in the “you might also like / people also buy” section. The basis of the assumption is the user’s expressed behavior. The products itself are matched on visual criteria — for example, red shoes and red lipstick for the red dress. Pinterest is using image recognition CNN differently - they rely more on the visual credentials matching, and this results in a pure visual matching supplemented with tagging.
Face recognition deserves a separate mention. It is a subdivision of image recognition dedicated to the comprehension of more complex types of images - human faces (or other living beings, animals, fish, and insects included).
The difference between straight image recognition and face recognition is in the complexity of the operation. There is an additional layer of work involved.
- First goes basic object recognition - the shape of the face and its features are recognized.
- Then the features of the face are further analyzed to identify its key credentials. For example, it can be the shape of the nose, its skin tone, texture, or presence of scar, hair or other anomalies on the surface;
- Then the sum of these credentials is calculated into the image data perception of the appearance of a particular human being. This process involves studying numerous samples that present the subject in a different form (for example, with or without sunglasses).
- Then the input image is compared with the database, and that’s how the system recognizes a particular face.
These days, face recognition is widely used in social media like Facebook, both for social networking and entertainment purposes.
- In social networking, face recognition serves as a streamlining of the often dubious process of tagging people on the photo. This feature is especially helpful when you need to tag through a couple of hundred images from the conference, or there are way too many faces to tag. So if you are going to build your own social network, think about this feature.
- In entertainment, face recognition lays the groundwork for further transformations and manipulations. Facebook Messenger’s filters and Snapchat Looksery filters are the most prominent examples. The filters jump from the autogenerated basic layout of the face and attach new elements or effects.
Facial recognition technology is slowly establishing itself as a viable option for person identification.
While face recognition algorithms alone can’t serve as a guaranteed verification of the persona on par with fingerprints and legal documents, face recognition is very helpful in identifying the person in cases of limited information - for example, from the surveillance camera footage or sneak video recording.
Optical Character Recognition, aka OCR, is a variation of image recognition specifically designed to process written and print symbols, graphs, and charts. Just like face recognition, it involves a more complicated process with move moving parts.
In its core, OCR is a combination of computer vision with natural language processing. First, the image is recognized and deconstructed into characters; then, the characters are extracted together into a coherent whole.
Here’s how it works:
- First step, there is image recognition involved. The image is scanned for elements that resemble written characters (it can be specific characters or in general).
- Second step, each character is broken down to key credentials that identify it as such (for example, a particular shape of letters “S” or “Z.”)
- Third step, the identified character image is matched with the respective character encoding.
- Fourth step, the recognized characters are compiled into the text according to the visual layout of an input image (RNN partially handles this bit).
Image tagging and further descriptions of the image content for better indexing and navigation are using CNN. Many eCommerce platforms such as Amazon, who take chances on engaging users via every single bit of product information are using it for a more significant impact.
Optical Character Recognition of handwriting is widely applied in the sensitive legal realms, such as banking and insurance.
The personal signature recognition serves as an additional layer of validating and verifying the proceedings. The process resembles face recognition bar the generalization.
Just like the face, a signature contains unique features that make it distinct from the others.
Unlike the face, signatures contain a minimal amount of generic elements with the majority of data being unique credentials (for example, the infamous Donald Trump “demon screaming” signature - only one person can make it the way it is).
These shifts focus from matching key credentials to checking the compliance of the particular sample with the essential credentials of the typical signature of a specific person.
On the other hand, digitizing various document and systemizing legacy data is one of the primes uses of Optical Character Recognition.
The formatting of the text plays a significant role as it is crucial to transcribe the document’s content accurately. OCR algorithms are doing that by referencing the document templates, which means the whole operation resembles an elaborate “connect the dots” game.
Healthcare is the industry where all the cutting edge technologies get their trial on fire.
If you want to determine the practical worth of a particular technology - try using it for some healthcare purposes. Image recognition is no different.
Medical Image Computing is probably the most exciting application of image recognition convolutional neural networks due to the sheer complexity and diversity of the task.
Unlike consumer-level image recognition, whose task is more or less to identify the content of an image straight and narrow - medical image involves a whole lot of additional data analysis that spurs from initial image recognition.
For instance, medical image classification CNN can be used to detect the anomalies on the X-ray or MRI images with higher precision than the human eye.
Such systems can show how the sequence of images and the differences between them (the sequence analysis itself is relegated to Recurrent Neural Networks). This feature prepares the grounds for further predictive analytics.
Medical image classification relies on vast databases that include Public Health Records (that serve as a training basis for the algorithms) and patients private data and test results. Together they make a powerhouse analytical platform that helps to keep an eye on the current state of the patient and also predict possible outcomes.
Saving lives is a top priority in healthcare. And this task is accomplished by all means necessary. And it is always better to have a power of foresight at hand when it comes to handling the patient treatment to be ready for anything potentially challenging to the patient’s well-being.
Case in point - health risk assessment.
This field of healthcare is the one where cutting edge technologies like Convolutional Neural Network Predictive Analytics are widely applied to a maximum effect.
Here’s how Health Risk Assessment CNN works:
- CNN process data with a grid topology approach - a set of spatial correlations between data points. In the case of images, the grid is two-dimensional. In case of time series textual data - the grid is one-dimensional.
- Then the convolution algorithm is applied to recognize some aspects of input;
- Take into consideration the variations of an input ;
- Determine sparse interactions between variables;
- Apply same settings for multiple functions of a model
Health Risk Assessment applications are a broad term, so let’s explain the most prominent:
- Overall, HRA is a predictive application that calculates the probability of certain events (in this case, disease progression or complications) happening based on the patient data and also comparatively historical patient data from the public health records. The algorithm looks for similar cases in PHR, analyzes the patient’s data, finds common patterns, and calculate possible future outcomes. Routine health checks can benefit from using this system;
- The framework can expand by adding the treatment plan. In this case, the prediction is designed to determine the optimal way of treating the symptoms of the disease in terms of time, resources, and patient’s well-being.
- HRA system also can be used to study the specific environment and explore possible risks for people working there. The assessment of dangerous environments such as nuclear power stations and factories and also in cases of ecological disasters are using this approach. For example, in Australia, the officials are studying sun activity and determine the level of sun radiation threat for the inhabitants.
Drug discovery is another major healthcare field with the extensive use of CNNs. It is also one of the most creative applications of convolutional neural networks in general.
Just like RNN (Recurrent Neural Network) and stock market prediction, drug discovery, and CNN is pure data tweaking.
The thing is - drug discovery and development is a lengthy and expensive process. Because of that better scalability and cost-effectiveness are very important in drug discovery.
The very method of creating new drugs is very convenient for implementation of neural networks - there is a lot of data produced and in this data are many hidden possibilities and threats to take into consideration during the development of the new drug.
The process of drug discovery involves the following stages:
- Analysis of observed medical effects - this is a clustering and classification problem.
- Hit discovery - that’s where machine learning anomaly detection may come in handy. The algorithm goes through the compound database and tries to uncover new activities for specific purposes.
- Then the selection of results is narrowed down to the most relevant via Hit to Lead process. That’s dimensionality reduction and regression.
- Next goes Lead Optimization - the process of combining and testing the lead compounds and finding the most optimal approaches to them. The stages involve the analysis of chemical and physical effects on the organism and also how the living organism acts on the drug.
After that, the development shifts to live testing. Machine learning algorithms take a back seat and primarily used to structure incoming data.
CNN significantly streamlines and optimizes the drug discovery process on the critical stages and allow to compress the timeframe considerably for the development of cures for emerging diseases.
The similar approach also can be used with the existing drugs during the development of a treatment plan for patients. Precision medicine is the emerging subdivision of health risk assessment specifically designed to combine available medical resources and the state of the patient and determine the most effective way of treating the disease.
Precision medicine is an elaborate variation of supply chain management, predictive analytics, and user modeling.
Here’s how it works:
- From the data point of view, the patient is the set of stats that depend on a variety of factors (symptoms and treatments).
- The addition of the variables (types of treatment) causes specific effects in short and long term perspectives.
- Each variable has its own set of stats regarding its effect on a symptom.
- This data is combined to create an assumption of what is the best course of action according to the available information.
- Then various results and changes in the patient’s state are put into perspective. That’s how the assumption is verified. Recurrent neural networks handle this stage as it requires the analysis of the sequences of the data points.
Convolutional Neural Networks have the potential of uncovering and describing the hidden and apparent data in an accessible and insightful manner.
Even in its most basic applications, it is impressive how much is possible with the assistance of a neural network.
The way CNN recognize images says a lot about the composition and execution of the visuals. On the other hand, Convolutional Neural Networks are used to discover newer and more effective drugs is just one of the many inspiring examples of artificial neural networks making the world a better place.
CNN contributes to the way we see the world and operate within it - think about how many times you’ve met an interesting person because of the tag on the photo? Or how many times you’ve found the thing you’ve been looking for via Google’s visual search.
That’s all CNN in action.
Develop a neural network for your business with us