DSRL
DSRL copied to clipboard
🔥 Datasets and env wrappers for offline safe reinforcement learning
DSRL (Datasets for Safe Reinforcement Learning) provides a rich collection of datasets specifically designed for offline Safe Reinforcement Learning (RL). Created with the objective of fostering progress in offline safe RL research, DSRL bridges a crucial gap in the availability of safety-centric public benchmarks and datasets.

DSRL provides:
- Diverse datasets: 38 datasets across different safe RL environments and difficulty levels in SafetyGymnasium, BulletSafetyGym, and MetaDrive, all prepared with safety considerations.
- Consistent API with D4RL: For easy use and evaluation of offline learning methods.
- Data post-processing filters: Allowing alteration of data density, noise level, and reward distributions to simulate various data collection conditions.
This package is a part of a comprehensive benchmarking suite that includes FSRL and OSRL and aims to promote advancements in the development and evaluation of safe learning algorithms.
We provided a detailed breakdown of the datasets, including all the environments we use, the dataset sizes, and the cost-reward-return plot for each dataset. These details can be found in the docs folder.
To learn more, please visit our project website. If you find this code useful, please cite:
@article{liu2023datasets,
title={Datasets and Benchmarks for Offline Safe Reinforcement Learning},
author={Liu, Zuxin and Guo, Zijian and Lin, Haohong and Yao, Yihang and Zhu, Jiacheng and Cen, Zhepeng and Hu, Hanjiang and Yu, Wenhao and Zhang, Tingnan and Tan, Jie and others},
journal={arXiv preprint arXiv:2306.09303},
year={2023}
}
Installation
Install from PyPI
DSRL is currently hosted on PyPI, you can simply install it by:
pip install dsrl
It will by default install bullet-safety-gym
and safety-gymnasium
environments automatically.
If you want to use the MetaDrive
environment, please install it via:
pip install git+https://github.com/HenryLHH/metadrive_clean.git@main
Install from source
Pull this repo and install:
git clone https://github.com/liuzuxin/DSRL.git
cd DSRL
pip install -e .
You can also install the MetaDrive
package simply by specify the option:
pip install -e .[metadrive]
How to use DSRL
DSRL uses the Gymnasium API. Tasks are created via the gymnasium.make
function. Each task is associated with a fixed offline dataset, which can be obtained with the env.get_dataset()
method. This method returns a dictionary with:
-
observations
: An N × obs_dim array of observations. -
next_observations
: An N × obs_dim of next observations. -
actions
: An N × act_dim array of actions. -
rewards
: An N dimensional array of rewards. -
costs
: An N dimensional array of costs. -
terminals
: An N dimensional array of episode termination flags. This is true when episodes end due to termination conditions such as falling over. -
timeouts
: An N dimensional array of termination flags. This is true when episodes end due to reaching the maximum episode length.
The usage is similar to D4RL. Here is an example code:
import gymnasium as gym
import dsrl
# Create the environment
env = gym.make('OfflineCarCircle-v0')
# Each task is associated with a dataset
# dataset contains observations, next_observatiosn, actions, rewards, costs, terminals, timeouts
dataset = env.get_dataset()
print(dataset['observations']) # An N x obs_dim Numpy array of observations
# dsrl abides by the OpenAI gym interface
obs, info = env.reset()
obs, reward, terminal, timeout, info = env.step(env.action_space.sample())
cost = info["cost"]
# Apply dataset filters [optional]
# dataset = env.pre_process_data(dataset, filter_cfgs)
Datasets are automatically downloaded to the ~/.dsrl/datasets
directory when get_dataset()
is called. If you would like to change the location of this directory, you can set the $DSRL_DATASET_DIR
environment variable to the directory of your choosing, or pass in the dataset filepath directly into the get_dataset
method.
You can use run the following example scripts to play with the offline dataset of all the supported environments:
python examples/run_mujoco.py --agent [your_agent] --task [your_task]
python examples/run_bullet.py --agent [your_agent] --task [your_task]
python examples/run_metadrive.py --road [your_road] --traffic [your_traffic]
Normalizing Scores
- Set target cost by using
env.set_target_cost(target_cost)
function, wheretarget_cost
is the undiscounted sum of costs of an episode - You can use the
env.get_normalized_score(return, cost_return)
function to compute a normalized reward and cost for an episode, wherereturns
andcost_returns
are the undiscounted sum of rewards and costs respectively of an episode. - The individual min and max reference returns are stored in
dsrl/infos.py
for reference.
License
All datasets are licensed under the Creative Commons Attribution 4.0 License (CC BY), and code is licensed under the Apache 2.0 License.