Learn through interaction, rewards, and exploration
Interactive Q-Learning simulation, grid world environment, and hands-on practice
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment. The agent receives rewards or penalties based on its actions and learns to maximize cumulative rewards over time through trial and error.
Agent learns a model of the environment and uses it for planning.
Agent learns directly from experience without modeling the environment.
Directly optimizes the policy without value functions.
Watch the agent learn to navigate from start to goal while avoiding obstacles and collecting rewards!
Create a table Q(s,a) with zeros for all state-action pairs
Select action using ε-greedy policy: explore randomly or exploit best known action
Execute action, observe next state and reward from environment
Q(s,a) ← Q(s,a) + α[r + γ max Q(s',a') - Q(s,a)]
Continue until convergence or maximum episodes reached
Where:
Description: On-policy TD control algorithm that updates Q-values based on the action actually taken.
Update Rule: Q(s,a) ← Q(s,a) + α[r + γQ(s',a') - Q(s,a)]
Best for: Safe learning, when exploration matters
Description: Uses deep neural networks to approximate Q-values for high-dimensional state spaces.
Key Features: Experience replay, target network, handles continuous states
Best for: Complex environments like Atari games, robotics
Description: Directly optimizes the policy by gradient ascent on expected reward.
Algorithms: REINFORCE, Actor-Critic, PPO, A3C
Best for: Continuous action spaces, stochastic policies
Answer these questions to test your understanding of reinforcement learning concepts.