Introduction to Convolutional Neural Networks

Diego Junco
Publication Date
3 June 2021

Introduction to Convolutional Neural Networks

AI neural networks

The creation of AI, Artificial Intelligence, neural networks and the technological breakthroughs in this matter have a profound impact on our lives. It is undeniable that the incorporation of this science into our daily lives is creating an enormous change for us, the participants, that without question is one of the major technological revolutions in our history.

There are several types of neural networks, one of which is the Convolutional Neural Network, CNN, which imitates the neurons from the visual cortex in a biological brain and are key to this revolution. Although the fundamentals of CNNs are based on Neocognitron introduced by Kunihiko Fukushima in 1980, their breakthrough began in 2012 when CNNs were implemented using graphics processing units (GPU), achieving impressive results.

What is CNN?

Most modern Deep Learning models are based on artificial neural networks, specifically CNNs. A convolutional neural network is a class of deep neural networks, most commonly applied to analyzing visual imagery. They have applications in image and video recognition, recommender systems, image classification, image segmentation, medical image analysis, natural language processing, brain-computer interfaces, and financial time series.







This is a representation of a convolutional neural network, where an image is used as an input and the result is category detection. CNN layers process the input and pass its result to the next layer which is similar to the response of a neuron in the visual cortex to a specific stimulus.

The core of CNN: the convolution

The core of a CNN is the convolution, but what’s a convolution? In mathematics a convolution is a process which combines two functions on a set to produce another function on the set. We can think of images as two-dimensional functions. Many important image transformations are convolutions where you convolve the image function with a very small, local function called a “kernel.” The kernel slides to every position of the image and computes a new pixel as a weighted sum of the pixels it floats over, in order to process the spatial structure of the image.







Image source: River Trail documentation








Image source:
In the animation we have an input image and a kernel which sweeps through the input image to generate an output image called a feature map.

Convolutions are used to detect simple patterns such as edge detection in an input image. For example, we can detect edges by taking the values −1 and 1 on two adjacent pixels, and zero everywhere else. That is, we subtract two adjacent pixels. When side by side pixels are similar, this gives us approximately zero. On edges, however, adjacent pixels are very different in the direction perpendicular to the edge.


Image source: image kernals
Bottom & Right sobel.

An interactive demo explaining how image kernels work can be seen in this post from Victor Powell

CNNs multilayered architecture

A CNN has a multilayered architecture where an image is used as an input and has multiple layers of convolutions which creates feature maps. In inner layers, a pooling process is performed in order to reduce the resolution of the subsequent feature maps and increase their quantity. The end layer of feature maps is passed through a linear classifier where a one dimensional array called output layer is created containing the tags the image could be associated with. When a pixel in this one dimensional array is activated it means the image was classified as its associated tag.

CNNs can detect complex patterns such as handwritten numbers because the architecture can perform detections over the previous detections (The input of an inner layer is the output from the previous one). How do we define which convolutions to apply to solve a specific problem? The neural network does it during the training process where we have a set of training images with a tag/category associated with each one of these. Then the neural network decides which are the best convolutions combinations to apply in order to classify the training dataset successfully.


Image source:
A convolutional neural network consists of an input layer, hidden layers and an output layer. The hidden layers include layers that perform convolutions and create feature maps.

Image source:
Input layer, convolutional/hidden layers and output layer of a convolutional neural network which detects hand written numbers.

Examples of CNN in use

CNNs can be used in many different applications. To name a few; realistic face generation, super resolution, style transfer, and black and white to color pictures. It can also be used within digital solutions to enable greater personalisation and prediction features. Mobiquity has used AI and machine learning in a number of interesting projects.

The first is a project with a leading oil & gas provider. Mobiquity was able to leverage AI/ML to reduce costs and Co2 emissions. Together with the client, Mobiquity worked on creating a digital twin using Machine Learning to accurately model the working parts of the plants. This innovative approach has the ability to predict how the plant will evolve over time and is capable of accurately predicting energy consumption. By implementing this solution the client has been able to improve its electricity consumption forecast by 50% which significantly reduced their costs and enabled them to reduce their CO2 emissions.

Another inventive way that Mobiquity has been able to utilise Machine Learning, was to create a personalised event experience for a client. A major technology company who hosts large events wanted to find a solution to improve their attendees experience through their event app. Mobiquity worked with the client to add personalisation features to the existing app, using Amazon personalise Machine Learning. These enhancements resulted in improved attendee engagement and satisfaction, scheduling effectiveness, and contributed to an overall successful event.

Remarkable models based on CNNs


Image source: This person doesn't exist, Arxiv, Github AI Art, Medium

The future of CNNs

Ever since the breakthrough of CNN in 2012, the evolution in Deep Learning techniques has accelerated exponentially, ranging from image classification models used by self driven vehicles to text generation models like GPT-3 that produces human-like texts. For that reason we can expect that Deep Learning tools will continue evolving in the next 5 to 10 years to a point this kind technology will be widely democratised and become a standard tool for every person, having an ubiquitous presence in our daily lives.


Interested to chat more about this topic? Let's talk

Want to read more? Download our free white paper on transforming healthcare with reinforcement learning.

Let our expertise complement yours

Leave your details below and we'll be in touch soon.