Nicklas Hansen comments

Results 34 comments of


                                            Nicklas Hansen

Your LSTM implementation does not pass gradient check successfully

Hello, Thanks for bringing the potential issue to my attention. I will have to investigate this issue a bit deeper before taking any action.

Typo in pytorch_vision_resnet.md

Yes, this should get changed. Definitely a typo.

fix number of updates for actor and critic_target networks

Good catch. I believe this would be a cleaner fix? ``` if step >= args.init_steps: num_updates = args.init_steps if step == args.init_steps else 1 for i in range(num_updates): agent.update(replay_buffer, L,...

Question about video background

Hi, how do you save videos? The video background is applied here as post-processing: https://github.com/nicklashansen/dmcontrol-generalization-benchmark/blob/ee658ceb449b884812149b922035197be8e28c87/src/video.py#L30, although I would need to double-check that this functionality has not been broken due to...

Shaky actions with custom environment

This is a common challenge with RL policies unfortunately, and TD-MPC2 is no exception. A few things that I have found to help empirically when deploying our learned policies on...

Shaky actions with custom environment

1. Afaik there is no direct model-free equivalent at the moment. The policy update is similar but not equivalent to SAC, same goes for the value update. Whether you benefit...

Shaky actions with custom environment

> The way you phrased this sentence makes me think you're saying that the nature of the policy being a network predicting the parameters of a gaussian distribution leads to...

Shaky actions with custom environment

@Dobid Sorry, I missed this! I'll provide a brief answer to your questions in case it's still relevant: > The reward is supposed to be a function designed by the...

data loading typo

Great catch! This seems to be a leftover from the previous buffer implementation (pre commit https://github.com/nicklashansen/tdmpc2/commit/54145a4d8c4c080836ff1f186fc5a87f70c8a8c7). I'll fix this soon and double-check that everything else in the offline trainer works...

data loading typo

Closing this issue since it was fixed a while ago. Thanks again for reporting it!