tdmpc2
tdmpc2 copied to clipboard
Clarification regarding OnlineTrainer._step, env_step and action_repeat
Hi,
First of all, thanks for the great repo.
I want to compare my method to TD-MPC2 but I have a question regarding how you have reported results for TD-MPC2. The OnlineTrainer class increases self._step by 1 for each action executed in the environment. However, as TD-MPC2 uses action_repeat=2 the environment step should be calculated as self._step*action_repeat. Please can you confirm if the .csv results files contain:
- The raw
_stepfromOnlineTrainer, such that they need to be multiplied byaction_repeatto get the environment step. - Or have they been post-processed and multiplied by action repeat already?
Cheers, Aidan