Skip to content

Home

Code for my walkthrough of: Reinforcement Learning An Introduction by Richard Sutton and Andrew Barto

  • Bandits


    Chapter 2: Multi-armed Bandits

    Chapter 2

  • Finite MDPs


    Chapter 3: Finite Markov Decision Processes

    Chapter 3

Quickstart

Algorithm implementations are located in the /src directory while the scaffolding code/notebooks for recreating/exploring Sutton & Barto are segmented into the experiments/ directory.

e.g. for recreating Figure 2.3, navigate to /experiments/ch2_bandits/ and run:

python run.py -m run.steps=1000 run.n_runs=2000 +bandit.epsilon=0,0.01,0.1 +bandit.random_argmax=true experiment.tag=fig2.2 experiment.upload=true

image Figure 2.3 (rlbook): The +bandit.random_argmax=true flag was used to switch over to an argmax implementation that randomizes between tiebreakers rather than first occurence used in the default numpy implementation to better align with the original example. Link to wandb artifact

Further details on experimental setup and results can be found within the corresponding chapter docs.