Nicklas Hansen
Nicklas Hansen
Hello, Thanks for bringing the potential issue to my attention. I will have to investigate this issue a bit deeper before taking any action.
Yes, this should get changed. Definitely a typo.
Good catch. I believe this would be a cleaner fix? ``` if step >= args.init_steps: num_updates = args.init_steps if step == args.init_steps else 1 for i in range(num_updates): agent.update(replay_buffer, L,...
Hi, how do you save videos? The video background is applied here as post-processing: https://github.com/nicklashansen/dmcontrol-generalization-benchmark/blob/ee658ceb449b884812149b922035197be8e28c87/src/video.py#L30, although I would need to double-check that this functionality has not been broken due to...
This is a common challenge with RL policies unfortunately, and TD-MPC2 is no exception. A few things that I have found to help empirically when deploying our learned policies on...
1. Afaik there is no direct model-free equivalent at the moment. The policy update is similar but not equivalent to SAC, same goes for the value update. Whether you benefit...
> The way you phrased this sentence makes me think you're saying that the nature of the policy being a network predicting the parameters of a gaussian distribution leads to...
@Dobid Sorry, I missed this! I'll provide a brief answer to your questions in case it's still relevant: > The reward is supposed to be a function designed by the...
Great catch! This seems to be a leftover from the previous buffer implementation (pre commit https://github.com/nicklashansen/tdmpc2/commit/54145a4d8c4c080836ff1f186fc5a87f70c8a8c7). I'll fix this soon and double-check that everything else in the offline trainer works...
Closing this issue since it was fixed a while ago. Thanks again for reporting it!