homomorphic_policy_gradient
homomorphic_policy_gradient copied to clipboard
Author's PyTorch Implementation of Deep Homomorphic Policy Gradient (DHPG) - NeurIPS 2022 and JMLR 2024
Continuous MDP Homomorphisms and Homomorphic Policy Gradients
Update (May, 12, 2023)
- Stochastic DHPG algorithm has been added to this repository. The original NeurIPS 2022 paper proposes deterministic DHPG, while the following preprint introduces stochastic DHPG and compares it against the deterministic variant:
-
In order to run the new agents with the stochastic policy, follow the instructions below and simply use
agent=stochastichpgfor stochastic DHPG, oragent=stochastichpg_augfor stochastic DHPG with image augmentation. -
The novel symmetric environments (Section 7.2) are in the repos symmetry_RL and mountain_car_3D.
Instructions for the NeurIPS 2022 Paper
- Author's PyTorch implementation of Deep Homomorphic Policy Gradients (DHPG). If you use our code, please cite our NeurIPS 2022 paper:
- DHPG simultaneously learns the MDP homomorphism map and learns the optimal policy using the homomorphic policy gradient theorem for continuous control problems:
I. Setup
- Install the following libraries needed for Mujoco and DeepMind Control Suite:
sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3
- Install Mujoco and DeepMind Control Suite following the official instructions.
- We recommend using a conda virtual environment to run the code. Create the virtual environment:
conda create -n hpg_env python=3.8
conda activate hpg_env
pip install --upgrade pip
- Install dependencies of this package:
pip install -r requirements.txt
II. Instructions
- This code includes our Python implementation of DHPG and all
the baseline algorithms used in the paper:
- Pixel observations: Deterministic DHPG, Stochastic DHPG, DBC, DeepMDP, SAC-AE, DrQ-v2.
- State observations: Deterministic DHPG, Stochastic DHPG, TD3, DDPG, SAC.
- Results were obtained on Python v3.8.10, CUDA v11.4, PyTorch v1.10.0 on 10 seeds.
Training on Pixel Observations (Section 7.2, Appendices D.2, D.5, D.6 of the NeurIPS Paper)
- To train agents on pixel observations:
python train.py task=pendulum_swingup agent=hpg
- Available DHPG agents are:
hpg,hpg_aug,stochastichpg,stochastichpg_aug,hpg_ind,hpg_ind_aug:hpgis the deterministic DHPG variant in which gradients of HPG and DPG are summed together for a single actor update (hpg_augishpgwith image augmentation.)stochastic_hpgis stochastic DHPG (stochastic_hpg_augisstochastic_hpgwith image augmentation)hpg_indis the deterministic DHPG variant in which gradients of HPG and DPG are used to independently update the actor (hpg_ind_augishpg_indwith image augmentation.)- See Appendix D.5 for more information on these variants.
- Available baseline agents are:
drqv2,dbc,deepmdp,sacae.- You can run each baseline with image augmentation by simply adding
_augto the end of its name. For example,dbc_augrunsdbcwith image augmentation.
- You can run each baseline with image augmentation by simply adding
- If you do not have a CUDA device, use
device=cpu.
Training on State Observations (Section 7.1, Appendix D.1 of the NeurIPS Paper)
- To train agents on state observations:
python train.py pixel_obs=false action_repeat=1 frame_stack=1 task=pendulum_swingup agent=hpg
- Available DHPG agents are:
hpg,hpg_ind,stochastichpg. - Available baseline agents are:
td3,sac,ddpg_original,ddpg_ours.
Transfer Experiments (Appendix D.3 of the NeurIPS Paper)
- To run the transfer experiments, use
python transfer.pywith the same configurations discussed above for pixel observations, but usecartpole_transfer,quadruped_transfer,walker_transfer, orhopper_transferas thetaskargument.
Tensorboard
- To monitor results use:
tensorboard --logdir exp
III. Citation
If you are using our code, please cite our NeurIPS 2022 paper:
@article{rezaei2022continuous,
title={Continuous mdp homomorphisms and homomorphic policy gradient},
author={Rezaei-Shoshtari, Sahand and Zhao, Rosie and Panangaden, Prakash and Meger, David and Precup, Doina},
journal={Advances in Neural Information Processing Systems},
volume={35},
pages={20189--20204},
year={2022}
}
And our extended JMLR paper which contains the theoretical and empirical results for stochastic policies:
@article{panangaden2024policy,
author = {Prakash Panangaden and Sahand Rezaei-Shoshtari and Rosie Zhao and David Meger and Doina Precup},
title = {Policy Gradient Methods in the Presence of Symmetries and State Abstractions},
journal = {Journal of Machine Learning Research},
year = {2024},
volume = {25},
number = {71},
pages = {1--57},
url = {http://jmlr.org/papers/v25/23-1415.html}
}