iql-pytorch icon indicating copy to clipboard operation
iql-pytorch copied to clipboard

Unofficial PyTorch implementation (replicating paper results) of Implicit Q-Learning (In-sample Q-Learning) for offline RL

IQL Implementation in PyTorch

IQL

This repo is an unofficial implementation of Implicit Q-Learning (In-sample Q-Learning) in PyTorch.

@inproceedings{
    kostrikov2022offline,
    title={Offline Reinforcement Learning with Implicit Q-Learning},
    author={Ilya Kostrikov and Ashvin Nair and Sergey Levine},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=68n2s9ZJWF8}
}

Note: Reward standardization (We standardize MuJoCo locomotion task rewards by dividing by the difference of returns of the best and worst trajectories in each dataset) used in official implementation is missed in this implementation. One can easily add it by itself.

Train

Gym-MuJoCo

python main_iql.py --env halfcheetah-medium-v2 --expectile 0.7 --temperature 3.0 --eval_freq 5000 --eval_episodes 10 --normalize

AntMaze

python main_iql.py --env antmaze-medium-play-v2 --expectile 0.9 --temperature 10.0 --eval_freq 50000 --eval_episodes 100

Results

mujoco_results

antmaze_results

Acknowledgement

This repo borrows heavily from sfujim/TD3_BC and ikostrikov/implicit_q_learning.