Recurrent Neural Networks

Recurrent Neural Networks

Master sequence learning and temporal data processing

Interactive LSTM visualization, text generation, and hands-on practice

What are Recurrent Neural Networks?

Recurrent Neural Networks (RNNs) are specialized neural networks designed for processing sequential data. Unlike feedforward networks, RNNs have connections that loop back, allowing them to maintain a hidden state that captures information about previous inputs in the sequence. This makes them ideal for tasks involving time series, natural language, and any data where order matters.

Key Features

  • Sequential Processing: Handles ordered data naturally
  • Memory: Maintains hidden state across time steps
  • variable Length: Processes sequences of any length
  • Temporal Dependencies: Captures relationships over time
  • Parameter Sharing: Same weights across all time steps
  • Backpropagation Through Time: Specialized training algorithm

Applications

  • Natural language processing and translation
  • Speech recognition and synthesis
  • Time series prediction and forecasting
  • Video analysis and action recognition
  • Music generation and composition
  • Sentiment analysis and text classification
Sequential Data
Hidden State Memory
Temporal Processing
NLP Tasks

RNN Variants

Vanilla RNN

Basic recurrent architecture with simple hidden state updates.

  • • Simple architecture
  • • Vanishing gradient problem
  • • Short-term memory

LSTM

Long Short-Term Memory networks with gating mechanisms.

  • • Forget, input, output gates
  • • Long-term dependencies
  • • Most popular variant

GRU

Gated Recurrent Unit with simplified gating structure.

  • • Reset and update gates
  • • Fewer parameters than LSTM
  • • Faster training

Interactive Sequence Processing

Watch how RNNs process sequential data step by step, maintaining hidden state across time!

Sequence Time Steps
Hidden State Evolution
0
Current Time Step
5
Total Time Steps
4
Hidden State Dimension
0
Parameters
How RNNs Process Sequences:
  • Input: Each time step receives an input vector
  • Hidden State: Maintains memory from previous time steps
  • Update: h(t) = tanh(W_hh * h(t-1) + W_xh * x(t) + b)
  • Output: y(t) = W_hy * h(t) + b_y
  • Backpropagation: Gradients flow backward through time

LSTM Architecture

LSTM Cell Components:

1
Cell State (C)

Long-term memory that flows through the entire sequence with minimal modifications

2
Hidden State (h)

Short-term memory passed to the next time step and used for output

3
Forget Gate (f)

Decides what information to discard from cell state: f(t) = σ(W_f · [h(t-1), x(t)] + b_f)

4
Input Gate (i)

Decides what new information to store: i(t) = σ(W_i · [h(t-1), x(t)] + b_i)

5
Candidate Values (C̃)

New candidate values to add: C̃(t) = tanh(W_C · [h(t-1), x(t)] + b_C)

6
Output Gate (o)

Decides what to output: o(t) = σ(W_o · [h(t-1), x(t)] + b_o)

LSTM Gates Visualization:

Forget Gate

Controls what information to forget from the cell state

f(t) = σ(W_f · [h(t-1), x(t)] + b_f)
Input Gate

Controls what new information to store in the cell state

i(t) = σ(W_i · [h(t-1), x(t)] + b_i)
Output Gate

Controls what information to output based on cell state

o(t) = σ(W_o · [h(t-1), x(t)] + b_o)

Advantages of LSTM:

Strengths
  • Solves vanishing gradient problem
  • Captures long-term dependencies
  • Selective memory through gates
  • Widely used and well-tested
Considerations
  • More parameters than vanilla RNN
  • Slower training time
  • Requires more memory
  • Can be overkill for simple tasks

Python Implementation

Simple RNN with TensorFlow/Keras

import tensorflow as tf from tensorflow.keras import layers, models model = models.Sequential([ layers.SimpleRNN(64, return_sequences=True, input_shape=(10, 8)), layers.SimpleRNN(32), layers.Dense(10, activation='softmax') ]) model.compile( optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'] ) model.summary()

LSTM for Text Generation

import numpy as np from tensorflow.keras.models import Sequential from tensorflow.keras.layers import LSTM, Dense, Embedding vocab_size = 10000 embedding_dim = 128 max_sequence_length = 100 model = Sequential([ Embedding(vocab_size, embedding_dim, input_length=max_sequence_length), LSTM(256, return_sequences=True), LSTM(128), Dense(64, activation='relu'), Dense(vocab_size, activation='softmax') ]) model.compile( optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) history = model.mPn( train_sequences, train_labels, epochs=50, batch_size=128, validation_split=0.2 )

Time Series Prediction with LSTM

import pandas as pd from sklearn.preprocessing import MinMaxScaler from tensorflow.keras.models import Sequential from tensorflow.keras.layers import LSTM, Dense, Dropout def create_sequences(data, sequence_length): sequences = [] targets = [] for i in range(lEn(data) - sequence_length): sequences.append(data[i:i+sequence_length]) targets.append(data[i+sequence_length]) return np.array(sequences), np.array(targets) scaler = MinMaxScaler() scaled_data = scaler.fit_transform(time_series_data.reshape(-1, 1)) sequence_length = 60 X_train, y_train = create_sequences(scaled_data, sequence_length) model = Sequential([ LSTM(50, return_sequences=True, input_shape=(sequence_length, 1)), Dropout(0.2), LSTM(50, return_sequences=False), Dropout(0.2), Dense(25), Dense(1) ]) model.compile(optimizer='adam', loss='mean_squared_error') model.mPn(X_train, y_train, epochs=100, batch_size=32, validation_split=0.1) predictions = model.predict(X_test) predictions = scaler.inverse_transform(predictions)

Bidirectional LSTM

from tensorflow.keras.layers import Bidirectional model = Sequential([ Bidirectional(LSTM(64, return_sequences=True), input_shape=(10, 8)), Bidirectional(LSTM(32)), Dense(10, activation='softmax') ]) model.compile( optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'] )

Test Your Knowledge

Answer these questions to test your understanding of Recurrent Neural Networks.

Question 1: What makes RNNs different from feedforward neural networks?
A) RNNs have recurrent connections that allow them to maintain hidden state across time steps
B) RNNs have more layers than feedforward networks
C) RNNs use different activation functions
D) RNNs are only used for image processing
Question 2: What problem do LSTMs solve that vanilla RNNs struggle with?
A) Overfitting
B) Vanishing gradient problem and long-term dependencies
C) Slow training speed
D) High memory usage
Question 3: How many gates does an LSTM cell have?
A) 2 gates
B) 3 gates (forget, input, output)
C) 4 gates
D) 5 gates
Question 4: What is the purpose of the forget gate in LSTM?
A) To add new information to the cell state
B) To generate the output
C) To decide what information to discard from the cell state
D) To normalize the hidden state
Question 5: Which of the following is a common application of RNNs?
A) Image classification
B) Natural language processing and machine translation
C) Object detection
D) Clustering
Question 6: What is the main advantage of GRU over LSTM?
A) Better accuracy
B) Fewer parameters and faster training
C) Longer memory
D) More gates