Question for training dreamer_v3 on InvertedPendulum-v4

Open anthony0727 opened this issue 10 months ago • 1 comments

I ran below command to train InvertedPendulum-v4,

MUJOCO_GL=egl python sheeprl.py     exp=dreamer_v3     env=mujoco     env.id=InvertedPendulum-v4     fabric.accelerator=cuda     fabric.devices=1     fabric.strategy=ddp algo.mlp_keys.encoder=[state] algo.mlp_keys.decoder=[state] algo.cnn_keys.encoder=[] algo.cnn_keys.decoder=[]

but it the agent is not learning (expected to achieve 1000 score)

Rank-0: policy_step=829396, reward_env_1=4.0                                                                                                                                               
Rank-0: policy_step=829400, reward_env_2=4.0                                                                                                                                               
Rank-0: policy_step=829404, reward_env_0=4.0                                                                                                                                               
Rank-0: policy_step=829404, reward_env_3=4.0
Rank-0: policy_step=829412, reward_env_1=4.0
Rank-0: policy_step=829416, reward_env_2=4.0

Am I missing something?

Feb 07 '25 05:02 anthony0727

Hi @anthony0727, have you tried training with different seeds?

Mar 03 '25 10:03 belerico