Clarification regarding OnlineTrainer._step, env_step and action_repeat

Open aidanscannell opened this issue 1 year ago • 0 comments

Hi,

First of all, thanks for the great repo.

I want to compare my method to TD-MPC2 but I have a question regarding how you have reported results for TD-MPC2. The OnlineTrainer class increases self._step by 1 for each action executed in the environment. However, as TD-MPC2 uses action_repeat=2 the environment step should be calculated as self._step*action_repeat. Please can you confirm if the .csv results files contain:

The raw _step from OnlineTrainer, such that they need to be multiplied by action_repeat to get the environment step.
Or have they been post-processed and multiplied by action repeat already?

Cheers, Aidan

Sep 20 '24 08:09 aidanscannell