qrl-dqn-gym
qrl-dqn-gym copied to clipboard
PennyLane/PyTorch implementation of Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning (Skolik et al., 2021)
Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning
Description
This repository contains an implementation of the Quantum Deep Q-learning algorithm and its application to the FrozenLake and CartPole environments as in :
- Paper : Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning
- Authors : Skolik, Jerbi and Dunjko
- Date : 2021
Hyperparameters
| Hyperparameters | Frozen-Lake | Cart-Pole | Explanation |
|---|---|---|---|
| n_layers | 5,10,15 | 5 | number of layers |
| gamma | 0.8 | 0.99 | discount factor for Q-learning |
| w_input | True, False | train weights on the model input | |
| w_output | True, False | train weights on the model output | |
| lr | 0.001 | 0.001 | model parameter learning rate |
| lr_input | 0.001 | input weight learning rate | |
| lr_output | 0.1 | output weight learning rate | |
| batch_size | 11 | 16 | number of samples shown to optimizer at each update |
| eps_init | 1. | 1. | initial value for ε-greedy policy |
| eps_decay | 0.99 | 0.99 | decay of ε for ε -greedy policy |
| eps_min | 0.01 | 0.01 | minimal value of ε for ε-greedy policy |
| train_freq | 5 | 10 | steps in episode after which model is updated |
| target_freq | 10 | 30 | steps in episode after which target is updated |
| memory | 10000 | 10000 | size of memory for experience replay |
| data_reupload | True, False | use data re-uploading | |
| loss | SmoothL1 | SmoothL1 | loss type : MSE, L1 or SmoothL1 |
| optimizer | RMSprop | RMSprop | optimizer type : SGD, RMSprop, Adam, ... |
| total_episodes | 3500 | 5000 | total training episodes |
| n_eval_episodes | 5 | 5 | number episodes to evaluate the agent |
Experiments
The experiments in the paper are reproduced using PyTorch for optimization, PennyLane for quantum circuits and Gym for the environments.
Training
- Option 1 : Open in Colab. You can activate the GPU in Notebook Settings.
- Option 2 : Run on local machine. First, you need to install :
$ pip install gym torch torchvision pennylane tensorboard
You can run an experiment using the following command :
$ cd cart_pole/
$ python train.py
You can set your own hyperparameters :
$ cd cart_pole/
$ python train.py --batch_size=32
The list of hyperparameters is given above and accessible via :
$ cd cart_pole/
$ python train.py --help
To monitor the training process using tensorboard :
$ cd cart_pole/
$ python train.py
$ tensorboard --logdir logs/
The hyperparameters, checkpoints, training and evaluation metrics are saved in the logs/ folder.
Testing
You can test your agent by passing the path to your logged model.
$ cd cart_pole/
$ python test.py --path=logs/exp_name/ --n_eval_episodes=10
Trained agents are also provided in the logs folder.
$ cd cart_pole/
$ python test.py --path=logs/input_only/ --n_eval_episodes=10
Results
Cart-Pole
The circuit output is multiplied by 90 if no output weight is available.
| Setting | Average Reward | Hyperparameters and Checkpoints |
|---|---|---|
| No Weights | 181 | cart_pole/logs/no_weights/ |
| Input Weights | 200 | cart_pole/logs/input_only/ |
| Output Weights | 101 | cart_pole/logs/output_only/ |
| Input and Output Weights | 199 | cart_pole/logs/input_output/ |