Convolutional Neural Networks

What are Convolutional Neural Networks?

Convolutional Neural Networks (CNNs) are specialized deep learning architectures designed for processing grid-like data such as images. They use convolution operations to automatically learn spatial hierarchies of features, making them highly effective for computer vision tasks.

Key Components

Convolutional Layers: Extract features using filters
Pooling Layers: Reduce spatial dimensions
Activation Functions: Introduce non-linearity (ReLU)
Fully Connected Layers: Final classification
Filters/Kernels: Learnable feature detectors
Feature Maps: Output of convolution operations

Applications

Image classification and recognition
Object detection and localization
Facial recognition systems
Medical image analysis
Autonomous vehicle vision
Video analysis and tracking

Hierarchical Features

Spatial Reduction

Feature Extraction

Computer Vision

CNN Layer Types

Convolutional Layer

Applies filters to extract features like edges, textures, and patterns from input images.

• Learnable filters
• Stride & padding
• Feature maps output

Pooling Layer

Reduces spatial dimensions while retaining important features through downsampling.

• Max pooling
• Average pooling
• Dimension reduction

Fully Connected

Connects all neurons from previous layer for final classification decisions.

• Dense connections
• Classification layer
• Softmax output

Interactive Convolution operation

Watch how convolution filters extract features from input images in real-time!

Input Size

Filter Type

Stride

Input Matrix

Filter/Kernel (3x3)

Feature Map Output

6x6

Input Dimensions

3x3

Filter Size

4x4

Output Dimensions

Parameters

How Convolution Works:

Slide Filter: Move the filter across the input matrix
Element-wise Multiplication: Multiply overlapping values
Sum Results: Add all products to get output value
Stride: Number of pixels to move the filter
Feature Detection: Different filters detect different features

CNN Architecture

Typical CNN Architecture Flow:

Input Layer

Raw pixel values of image (e.g., 224x224x3 for RGB)

Convolutional Layers

Apply multiple filters to extract features at different levels

Activation Function (ReLU)

Introduce non-linearity: f(x) = max(0, x)

Pooling Layer

Downsample feature maps to reduce dimensions

Flatten Layer

Convert 2D feature maps to 1D vector

Fully Connected Layers

Dense layers for final classification

Output Layer

Softmax activation for class probabilities

Popular CNN Architectures:

LeNet-5

1998 - Digit Recognition

7 Layers

AlexNet

2012 - ImageNet Winner

8 Layers

VGGNet

2014 - Deep Architecture

16-19 Layers

ResNet

2015 - Skip Connections

50-152 Layers

Inception

2014 - Multi-scale

22 Layers

Output Size Calculation:

Output Size = ((Input Size - Filter Size + 2×Padding) / Stride) + 1

Example:

Input: 32×32, Filter: 5×5, Stride: 1, Padding: 0
Output: ((32 - 5 + 0) / 1) + 1 = 28×28

Python Implementation

Simple CNN with TensorFlow/Keras

import tensorflow as tf from tensorflow.keras import layers, models model = models.Sequential([ layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)), layers.MaxPooling2D((2, 2)), layers.Conv2D(64, (3, 3), activation='relu'), layers.MaxPooling2D((2, 2)), layers.Conv2D(64, (3, 3), activation='relu'), layers.Flatten(), layers.Dense(64, activation='relu'), layers.Dense(10, activation='softmax') ]) model.compile( optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) model.summary() history = model.mPn( train_images, train_labels, epochs=10, validation_data=(test_images, test_labels) ) test_loss, test_acc = model.evaluate(test_images, test_labels) print(f'Test accuracy: {test_acc:.4f}')

Custom Convolution operation

import numpy as np def convolve2d(image, kernel, stride=1): image_height, image_width = image.shape kernel_height, kernel_width = kernel.shape output_height = (image_height - kernel_height) // stride + 1 output_width = (image_width - kernel_width) // stride + 1 output = np.zeros((output_height, output_width)) for i in range(0, image_height - kernel_height + 1, stride): for j in range(0, image_width - kernel_width + 1, stride): region = image[i:i+kernel_height, j:j+kernel_width] output[i//stride, j//stride] = np.sum(region * kernel) return output edge_detection_kernel = np.array([ [-1, -1, -1], [-1, 8, -1], [-1, -1, -1] ]) input_image = np.random.rand(6, 6) feature_map = convolve2d(input_image, edge_detection_kernel) print(f'Input shape: {input_image.shape}') print(f'Output shape: {feature_map.shape}')

Test Your Knowledge

Answer these questions to test your understanding of Convolutional Neural Networks.

Question 1: What is the primary purpose of convolutional layers in CNNs?

A) Extract spatial features from input images

B) Reduce the number of parameters

C) Classify the final output

D) Normalize the input data

Question 2: What does pooling do in a CNN?

A) Increases the spatial dimensions

B) Reduces spatial dimensions while retaining important features

C) Adds more filters to the network

D) Performs classification

Question 3: What is a filter/kernel in CNNs?

A) The final output layer

B) A small matrix of learnable weights used for feature detection

C) The activation function

D) The loss function

Question 4: What is stride in convolution operations?

A) The size of the filter

B) The number of filters

C) The number of pixels the filter moves at each step

D) The learning rate

Question 5: Which activation function is most commonly used in CNNs?

A) Sigmoid

B) Tanh

C) ReLU (Rectified Linear Unit)

D) Linear

Question 6: What is the purpose of the flatten layer?

A) To reduce overfitting

B) To convert 2D feature maps into a 1D vector for fully connected layers

C) To normalize the data

D) To increase the number of parameters