Agents, environments, Q-learning, and policy gradients
Markov Decision Processes, value functions, and the exploration-exploitation trade-off
From Q-tables to deep neural network approximators
REINFORCE, Actor-Critic, and Proximal Policy Optimization
Gymnasium, Stable Baselines3, RLHF, robotics, and game AI