Convolutional Neural Networks

Convolutional Neural Networks

Master image processing and feature extraction with deep learning

Interactive convolution operations, filter visualizations, and hands-on practice

What are Convolutional Neural Networks?

Convolutional Neural Networks (CNNs) are specialized deep learning architectures designed for processing grid-like data such as images. They use convolution operations to automatically learn spatial hierarchies of features, making them highly effective for computer vision tasks.

Key Components

  • Convolutional Layers: Extract features using filters
  • Pooling Layers: Reduce spatial dimensions
  • Activation Functions: Introduce non-linearity (ReLU)
  • Fully Connected Layers: Final classification
  • Filters/Kernels: Learnable feature detectors
  • Feature Maps: Output of convolution operations

Applications

  • Image classification and recognition
  • Object detection and localization
  • Facial recognition systems
  • Medical image analysis
  • Autonomous vehicle vision
  • Video analysis and tracking
Hierarchical Features
Spatial Reduction
Feature Extraction
Computer Vision

CNN Layer Types

Convolutional Layer

Applies filters to extract features like edges, textures, and patterns from input images.

  • • Learnable filters
  • • Stride & padding
  • • Feature maps output

Pooling Layer

Reduces spatial dimensions while retaining important features through downsampling.

  • • Max pooling
  • • Average pooling
  • • Dimension reduction

Fully Connected

Connects all neurons from previous layer for final classification decisions.

  • • Dense connections
  • • Classification layer
  • • Softmax output

Interactive Convolution operation

Watch how convolution filters extract features from input images in real-time!

Input Matrix
Filter/Kernel (3x3)
Feature Map Output
6x6
Input Dimensions
3x3
Filter Size
4x4
Output Dimensions
9
Parameters
How Convolution Works:
  • Slide Filter: Move the filter across the input matrix
  • Element-wise Multiplication: Multiply overlapping values
  • Sum Results: Add all products to get output value
  • Stride: Number of pixels to move the filter
  • Feature Detection: Different filters detect different features

CNN Architecture

Typical CNN Architecture Flow:

1
Input Layer

Raw pixel values of image (e.g., 224x224x3 for RGB)

2
Convolutional Layers

Apply multiple filters to extract features at different levels

3
Activation Function (ReLU)

Introduce non-linearity: f(x) = max(0, x)

4
Pooling Layer

Downsample feature maps to reduce dimensions

5
Flatten Layer

Convert 2D feature maps to 1D vector

6
Fully Connected Layers

Dense layers for final classification

7
Output Layer

Softmax activation for class probabilities

Popular CNN Architectures:

LeNet-5
1998 - Digit Recognition
7 Layers
AlexNet
2012 - ImageNet Winner
8 Layers
VGGNet
2014 - Deep Architecture
16-19 Layers
ResNet
2015 - Skip Connections
50-152 Layers
Inception
2014 - Multi-scale
22 Layers

Output Size Calculation:

Output Size = ((Input Size - Filter Size + 2×Padding) / Stride) + 1

Example:

  • Input: 32×32, Filter: 5×5, Stride: 1, Padding: 0
  • Output: ((32 - 5 + 0) / 1) + 1 = 28×28

Python Implementation

Simple CNN with TensorFlow/Keras

import tensorflow as tf from tensorflow.keras import layers, models model = models.Sequential([ layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)), layers.MaxPooling2D((2, 2)), layers.Conv2D(64, (3, 3), activation='relu'), layers.MaxPooling2D((2, 2)), layers.Conv2D(64, (3, 3), activation='relu'), layers.Flatten(), layers.Dense(64, activation='relu'), layers.Dense(10, activation='softmax') ]) model.compile( optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) model.summary() history = model.mPn( train_images, train_labels, epochs=10, validation_data=(test_images, test_labels) ) test_loss, test_acc = model.evaluate(test_images, test_labels) print(f'Test accuracy: {test_acc:.4f}')

Custom Convolution operation

import numpy as np def convolve2d(image, kernel, stride=1): image_height, image_width = image.shape kernel_height, kernel_width = kernel.shape output_height = (image_height - kernel_height) // stride + 1 output_width = (image_width - kernel_width) // stride + 1 output = np.zeros((output_height, output_width)) for i in range(0, image_height - kernel_height + 1, stride): for j in range(0, image_width - kernel_width + 1, stride): region = image[i:i+kernel_height, j:j+kernel_width] output[i//stride, j//stride] = np.sum(region * kernel) return output edge_detection_kernel = np.array([ [-1, -1, -1], [-1, 8, -1], [-1, -1, -1] ]) input_image = np.random.rand(6, 6) feature_map = convolve2d(input_image, edge_detection_kernel) print(f'Input shape: {input_image.shape}') print(f'Output shape: {feature_map.shape}')

Test Your Knowledge

Answer these questions to test your understanding of Convolutional Neural Networks.

Question 1: What is the primary purpose of convolutional layers in CNNs?
A) Extract spatial features from input images
B) Reduce the number of parameters
C) Classify the final output
D) Normalize the input data
Question 2: What does pooling do in a CNN?
A) Increases the spatial dimensions
B) Reduces spatial dimensions while retaining important features
C) Adds more filters to the network
D) Performs classification
Question 3: What is a filter/kernel in CNNs?
A) The final output layer
B) A small matrix of learnable weights used for feature detection
C) The activation function
D) The loss function
Question 4: What is stride in convolution operations?
A) The size of the filter
B) The number of filters
C) The number of pixels the filter moves at each step
D) The learning rate
Question 5: Which activation function is most commonly used in CNNs?
A) Sigmoid
B) Tanh
C) ReLU (Rectified Linear Unit)
D) Linear
Question 6: What is the purpose of the flatten layer?
A) To reduce overfitting
B) To convert 2D feature maps into a 1D vector for fully connected layers
C) To normalize the data
D) To increase the number of parameters