pong-with-policy-gradients
pong-with-policy-gradients copied to clipboard
Code for an intro to RL workshop. You'll be training a simple agent to play pong using policy gradients. Adapted from http://karpathy.github.io/2016/05/31/rl/
Pong with Policy Gradients 🔨👷
Code for an intro to RL workshop. You'll be training a simple RL agent to play pong using vanilla policy gradients 😮💯
Adapted from http://karpathy.github.io/2016/05/31/rl/ and rewritten with PyTorch.
Accompanying slides are here.
Trained RL agent (green paddle) vs ball-tracking AI (tan paddle).
Instructions
👩🏫 🗣 There are five ### TODO:
statements where you'll need to fill in short pieces of code (no longer than a few lines) defining the policy network and calculating the policy gradients.
It takes a few hours to converge, but you should see some improvement within a few minutes. If not, you probably have a bug. Check terminal output and make use of TensorBoard training graphs 📈
Solution and trained network in solution (spoiler alert!)
folder - but try to do it yourself first! You got this 🤠
Setup
Make sure you have a working Python >= 3.5 installation. Also make sure it is 64-bit. You can see what version you have if you just run python
's interactive prompt.
Install virtualenv and create a new virtual environment:
On macOS and Linux:
python3 -m pip install --user virtualenv
python3 -m venv env
source env/bin/activate
On Windows:
python -m pip install --user virtualenv
python -m venv env
.\env\Scripts\activate
(P.S. you can leave the virtual environment by entering deactivate
into the
terminal when you're done)
Install dependencies:
Then, just install the requirements
pip install -r requirements.txt
Note: on Windows pytorch may fail to install through the above command, and you then need to install manually with
pip install torch==1.6.0+cpu torchvision==0.7.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
See the pytorch website for more details.
Running the Code
To run it yourself:
$ python pong.py [--render]
where --render
is an optional flag that renders pong games and slows them down to a watchable speed.
To test:
$ python test.py
(the tests are a helpful guide, but only check the policy network, calculating discounted rewards, and don't guarantee correctness!)
To view TensorBoard visualizations during training, open a separate terminal, activate the virtualenv, run
$ tensorboard --logdir tensorboard_logs
and visit http://localhost:6006/.