causal-confusion
causal-confusion copied to clipboard
Code for paper Causal Confusion in Imitation Learning
Causal Confusion in Imitation Learning
This is the code accompanying the paper: "Causal Confusion in Imitation Learning" by Pim de Haan, Dinesh Jayaraman and Sergey Levine, published at NeurIPS 2019. See the website for a video presentation of the work.
This simplified code implements the graph conditioned policy learning and intervention by policy execution for the MountainCar environment. Code for the other environments and intervention modes may be published at a later stage.
For questions or comments, feel free to submit an issue.
Dependencies
Assumes machines with CUDA 10. For machine without GPU or different CUDA versions, you may need to tweak the pytorch and tensorflow dependency.
Full dependency setup:
conda env create
Or by hand:
conda env create -n causal-confusion python=3.6
conda activate causal-confusion
conda install pytorch=1.0.1 torchvision cudatoolkit=10.0 ignite -c pytorch
conda install tensorflow-gpu==1.14 mpi4py scikit-learn
pip install git+https://github.com/pimdh/baselines@no-mujoco
Note I reference to a modified version of OpenAI baselines, as the provide pickle of the MountainCar expert does not work with the upstream version.
Also, I modified Baselines' setup.py to remove the Mujoco dependency, to allow for easier setup.
Usage
First generate demonstrations:
python -m ccil.gen_data
To show causal confusion with simple behaviour cloning agent on original and confounded state:
python -m ccil.imitate original simple
python -m ccil.imitate confounded simple
To train graph-parametrized policy on confounded state:
python -m ccil.imitate confounded uniform --save
To perform intervention by policy execution:
python -m ccil.intervention_policy_execution --num_its 10
Optionally, setting the DATA_PATH environment variable allows one to change the location of data files from the default ./data.