cleanrl icon indicating copy to clipboard operation
cleanrl copied to clipboard

DQN on MountainCar

Open qsh-zh opened this issue 1 year ago • 3 comments

Details

Problem Description

Pytorch DQN fails on MountainCar. Try two settings in the issue

Checklist

Current Behavior

image

Expected Behavior

DQN should learn the policy.

Possible Solution

Not sure what can be done. Quite surprising that DQN fails on the simple env.

Steps to Reproduce

Modifications in hotfix are same as the issue

# DQN-hotfix
rb = ReplayBuffer(
        args.buffer_size,
        envs.single_observation_space,
        envs.single_action_space,
        device,
        handle_timeout_termination=False,
    )

# DQN
rb = ReplayBuffer(
        args.buffer_size,
        envs.single_observation_space,
        envs.single_action_space,
        device,
        handle_timeout_termination=True,
    )

qsh-zh avatar Aug 06 '22 19:08 qsh-zh

Hello, thanks for reporting. Could you check if your performance match the reported performance in the docs? https://docs.cleanrl.dev/rl-algorithms/dqn/#experiment-results_1

Basically the performance is not that great as I had found it difficult to find a set of hyper parameters that work well for all three games we have tested.

vwxyzjn avatar Aug 06 '22 20:08 vwxyzjn

@vwxyzjn Thanks for your fast response. I think the performance almost matches what we have in the docs.

Except for the second random seed, seed=1/3 has a very similar behavior in my experiments~(never show the improvement compared with random policy).

Do you think the unsatisfying is due to suboptimal hyperparameters? Or DQN can not do well in the challenging env?

Thanks,

qsh-zh avatar Aug 06 '22 21:08 qsh-zh

Yeah it is unsatisfactory. We always welcome new contributors! If you are interested in trying out https://github.com/vwxyzjn/cleanrl/pull/228 to find a set of params that work well for CartPole-v1, MountainCar-v0, and Acrobot-v1, that will be great.

vwxyzjn avatar Aug 06 '22 21:08 vwxyzjn