Diffusion-Policies-for-Offline-RL
Diffusion-Policies-for-Offline-RL copied to clipboard
Bad performence on pen environment
Hi Zhendong,
I use the DQL code and the hyperparameters the code provided to test the algorithm on pen-cloned-v1, but the results I got is far away from what the paper said. The average score can only reach about 28. And the critic loss will be exploded to a crazy value (about 1e10).
The evaluation result is shown below:
The target Q mean result is shown below:
The critic loss result is shown below:
And then I test it on pen-human-v1 but got a similar bad result.
Have you met the same issue and how to solve it?
Thanks!
I am not sure which model selection method you are using. If you are using offline, the model training should stop before the critic values go crazy. For online setting, I remembered that I met critic value exploding sometimes for Adroit tasks, but it won't affect the highest performance, where the final model is selected. Or you could choose to strenghten the policy regularization part to avoid the critic exploding.
I rerun some experiments, and on my machine the performance matched.
Hi,bro.Could you tell me how to visualize data?I have trained agent and had a file which named debug.log.Your reply is so important to me.Looking forward to your reply
I use tenserboard to visualize the data :)
I use tenserboard to visualize the data :)
Thanks for your reply!I also want to know, did you run this project on Ubuntu and use Tensorboard for visualization?Thank you very much!
Yes, I run the project on ubuntu20.04.
Yes, I run the project on ubuntu20.04.
Your reply is very helpful to me!Thank you!Looking forward to our next communication!