cleanrl DQN on MountainCar

Details

Problem Description

Pytorch DQN fails on MountainCar. Try two settings in the issue

Checklist

[x] I have installed dependencies via poetry install (see CleanRL's installation guideline.
[x] I have checked that there is no similar issue in the repo (required)

Current Behavior

Expected Behavior

DQN should learn the policy.

Possible Solution

Not sure what can be done. Quite surprising that DQN fails on the simple env.

Steps to Reproduce

Modifications in hotfix are same as the issue

# DQN-hotfix
rb = ReplayBuffer(
        args.buffer_size,
        envs.single_observation_space,
        envs.single_action_space,
        device,
        handle_timeout_termination=False,
    )

# DQN
rb = ReplayBuffer(
        args.buffer_size,
        envs.single_observation_space,
        envs.single_action_space,
        device,
        handle_timeout_termination=True,
    )

Aug 06 '22 19:08 qsh-zh

Hello, thanks for reporting. Could you check if your performance match the reported performance in the docs? https://docs.cleanrl.dev/rl-algorithms/dqn/#experiment-results_1

Basically the performance is not that great as I had found it difficult to find a set of hyper parameters that work well for all three games we have tested.

Aug 06 '22 20:08 vwxyzjn

@vwxyzjn Thanks for your fast response. I think the performance almost matches what we have in the docs.

Except for the second random seed, seed=1/3 has a very similar behavior in my experiments~(never show the improvement compared with random policy).

Do you think the unsatisfying is due to suboptimal hyperparameters? Or DQN can not do well in the challenging env?

Thanks,

Aug 06 '22 21:08 qsh-zh

Yeah it is unsatisfactory. We always welcome new contributors! If you are interested in trying out https://github.com/vwxyzjn/cleanrl/pull/228 to find a set of params that work well for CartPole-v1, MountainCar-v0, and Acrobot-v1, that will be great.

Aug 06 '22 21:08 vwxyzjn

cleanrl cleanrl copied to clipboard

DQN on MountainCar

Problem Description

Checklist

Current Behavior

Expected Behavior

Possible Solution

Steps to Reproduce

cleanrl
cleanrl copied to clipboard