Diffusion-Policies-for-Offline-RL Bad performence on pen environment

Hi Zhendong,

I use the DQL code and the hyperparameters the code provided to test the algorithm on pen-cloned-v1, but the results I got is far away from what the paper said. The average score can only reach about 28. And the critic loss will be exploded to a crazy value (about 1e10).

The evaluation result is shown below: Screenshot from 2023-06-22 11-30-49

The target Q mean result is shown below: Screenshot from 2023-06-22 11-31-30

The critic loss result is shown below: Screenshot from 2023-06-22 11-32-23

And then I test it on pen-human-v1 but got a similar bad result.

Have you met the same issue and how to solve it?

Thanks!

Jun 22 '23 03:06 cccedric

I am not sure which model selection method you are using. If you are using offline, the model training should stop before the critic values go crazy. For online setting, I remembered that I met critic value exploding sometimes for Adroit tasks, but it won't affect the highest performance, where the final model is selected. Or you could choose to strenghten the policy regularization part to avoid the critic exploding.

I rerun some experiments, and on my machine the performance matched.

Jun 25 '23 23:06 Zhendong-Wang

Hi,bro.Could you tell me how to visualize data?I have trained agent and had a file which named debug.log.Your reply is so important to me.Looking forward to your reply

Jul 05 '23 20:07 HenryZhang-git

I use tenserboard to visualize the data :)

Jul 07 '23 14:07 cccedric

I use tenserboard to visualize the data :)

Thanks for your reply!I also want to know, did you run this project on Ubuntu and use Tensorboard for visualization？Thank you very much!

Jul 07 '23 14:07 HenryZhang-git

Yes, I run the project on ubuntu20.04.

Jul 07 '23 14:07 cccedric

Yes, I run the project on ubuntu20.04.

Your reply is very helpful to me！Thank you!Looking forward to our next communication！

Jul 08 '23 02:07 HenryZhang-git

Diffusion-Policies-for-Offline-RL Diffusion-Policies-for-Offline-RL copied to clipboard

Bad performence on pen environment

Diffusion-Policies-for-Offline-RL
Diffusion-Policies-for-Offline-RL copied to clipboard