RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Code for NeurIPS 2022 paper "Robust offline Reinforcement Learning via Conservative Smoothing". RORL trades off robustness and conservatism for offline RL via conservative smoothing and OOD underestimation.

The implementation is based on EDAC and rlkit.

1. Requirements

To install the required dependencies:

Install the MuJoCo 2.0 engine, which can be downloaded from here.
Install Python packages in the requirement file, d4rl and dm_control. The commands are as follows:

conda create -n rorl python=3.7
conda activate rorl
pip install --no-cache-dir -r requirements.txt

git clone https://github.com/rail-berkeley/d4rl.git
cd d4rl
# Note: remove lines including 'dm_control' in setup.py of d4rl
pip install -e .

2. Usage

2.1 Training

RORL Experiments for MuJoCo Gym:

python -m scripts.sac --env_name [ENVIRONMENT] --num_qs 10 --norm_input --load_config_type 'benchmark' --exp_prefix RORL

To reproduce results of adersarial experiments, you can simply replace 'benchmark' with 'attack'.

SAC-10:

python -m scripts.sac --env_name [ENVIRONMENT] --num_qs 10 --norm_input --exp_prefix SAC

EDAC:

python -m scripts.sac --env_name [ENVIRONMENT] --num_qs 10 --eta 1 --norm_input --exp_prefix EDAC

2.2 Evaluation

To evaluate trained agents in clean environments, run

python -m scripts.sac --env_name [ENVIRONMENT] --num_qs 10 --norm_input --eval_no_training --load_path [model path] --exp_prefix eval_RORL

'model path': e.g., ~/offline_itr_3000.pt.

To evaluate trained agents in adversarial environments, run

python -m scripts.sac --env_name [ENVIRONMENT] --num_qs 10 --norm_input --eval_no_training --load_path [model path] --eval_attack  --eval_attack_mode [mode]    --eval_attack_eps [epsilon]  --exp_prefix eval_RORL

'mode': 'random, action_diff, min_Q, action_diff_mixed_order, min_Q_mixed_order'.

'epsilon': [0.0, 0.3]

To evaluate trained agents in adversarial environments with different Q functions, run

python -m scripts.sac --env_name [ENVIRONMENT] --num_qs 10 --norm_input --eval_no_training --load_path [model path] --eval_attack  --eval_attack_mode [mode]    --eval_attack_eps [epsilon] --load_Qs [Qs path] --exp_prefix eval_RORL

'Qs path': the path of attacker's Q functions, which can be different from the evaluated agent's Q functions

Tips for Customizing RORL

According to our ablation study result in Appendix C, we summarize some tips for adapting RORL for customized use below.

Hyper-parameter Tuning: Since RORL is proposed to solve a challenging problem, it has many hyper-parameters. Our first suggestion is to use our hyper-parameter search range in Appendix B.1. You can tune them according to the importance of each component, where the general order is : OOD loss > policy smoothing loss > Q smoothing loss.
Computation Cost: If you want less GPU memory usage and less training time, you can (1) set $\beta_Q$ = 0 and $\epsilon_Q$ = 0 because the Q smoothing loss contributes the least but consumes a large computational cost, and (2) use a small number $n$ of sampled perturbed states to reduce the GPU memory usage.

Citation

If you find RORL helpful for your work, please cite:

@inproceedings{yang2022rorl,
  title={RORL: Robust Offline Reinforcement Learning via Conservative Smoothing},
  author={Yang, Rui and Bai, Chenjia and Ma, Xiaoteng and Wang, Zhaoran and Zhang, Chongjie and Han, Lei},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

Update

2022.11.26 fixed q smooth loss

RORL
RORL copied to clipboard

Metadata

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

1. Requirements

2. Usage

2.1 Training

RORL Experiments for MuJoCo Gym:

SAC-10:

EDAC:

2.2 Evaluation

To evaluate trained agents in clean environments, run

To evaluate trained agents in adversarial environments, run

To evaluate trained agents in adversarial environments with different Q functions, run

Tips for Customizing RORL

Citation

Update

← Metadata

Owner

Metadata

RORL RORL copied to clipboard

Metadata

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

1. Requirements

2. Usage

2.1 Training

RORL Experiments for MuJoCo Gym:

SAC-10:

EDAC:

2.2 Evaluation

To evaluate trained agents in clean environments, run

To evaluate trained agents in adversarial environments, run

To evaluate trained agents in adversarial environments with different Q functions, run

Tips for Customizing RORL

Citation

Update

← Metadata

Owner

Metadata

RORL
RORL copied to clipboard