cleanrl
cleanrl copied to clipboard
DQN on MountainCar
Details
Problem Description
Pytorch DQN fails on MountainCar. Try two settings in the issue
Checklist
- [x] I have installed dependencies via
poetry install
(see CleanRL's installation guideline. - [x] I have checked that there is no similar issue in the repo (required)
Current Behavior
Expected Behavior
DQN should learn the policy.
Possible Solution
Not sure what can be done. Quite surprising that DQN fails on the simple env.
Steps to Reproduce
Modifications in hotfix
are same as the issue
# DQN-hotfix
rb = ReplayBuffer(
args.buffer_size,
envs.single_observation_space,
envs.single_action_space,
device,
handle_timeout_termination=False,
)
# DQN
rb = ReplayBuffer(
args.buffer_size,
envs.single_observation_space,
envs.single_action_space,
device,
handle_timeout_termination=True,
)
Hello, thanks for reporting. Could you check if your performance match the reported performance in the docs? https://docs.cleanrl.dev/rl-algorithms/dqn/#experiment-results_1
Basically the performance is not that great as I had found it difficult to find a set of hyper parameters that work well for all three games we have tested.
@vwxyzjn Thanks for your fast response. I think the performance almost matches what we have in the docs.
Except for the second random seed, seed=1/3 has a very similar behavior in my experiments~(never show the improvement compared with random policy).
Do you think the unsatisfying is due to suboptimal hyperparameters? Or DQN can not do well in the challenging env?
Thanks,
Yeah it is unsatisfactory. We always welcome new contributors! If you are interested in trying out https://github.com/vwxyzjn/cleanrl/pull/228 to find a set of params that work well for CartPole-v1
, MountainCar-v0
, and Acrobot-v1
, that will be great.