rl_games icon indicating copy to clipboard operation
rl_games copied to clipboard

Wandb does not seem to record time or step correctly

Open DanielTakeshi opened this issue 3 years ago • 3 comments

I am running PPO with wandb integration, but the statistics seem to not be recorded as intended.

I am testing this with Isaac Gym environments but I am unsure if this issue is specific to Isaac Gym.

Steps to reproduce: after installing following the IsaacGymEnvs instructions, run a command like this in the isaacgymenvs/ directory:

python train.py task=Ant headless=True wandb_activate=True wandb_entity=danieltakeshi wandb_project=isaac-gym

Where you can replace danieltakeshi with your username, and change isaac-gym to your project.

After I run this, the reward goes up (good) but I also see this on wandb:

Screenshot from 2022-10-27 13-23-04

The code is recording the reward as a function of iter, step, and time. It stores it in rl_games here:

https://github.com/Denys88/rl_games/blob/d8645b2678c0d8a6e98a6e3f2b17f0ecfbff71ad/rl_games/common/a2c_common.py#L947-L955

The code is storing the statistics with respect to different quantities (epoch, step, and time) to the self.writer which is a tensorboardX.SummaryWriter (link to docs). But the statistics on wandb seem to only show the x-axis as "iter" (which is the same as epoch_num here) and they don't show performance as a function of the step or time. Is there a way to address such an issue here?

(Also posting on the Isaac Gym repo https://github.com/NVIDIA-Omniverse/IsaacGymEnvs/issues/87)

DanielTakeshi avatar Oct 27 '22 17:10 DanielTakeshi

@DanielTakeshi I am sorry I missed your issue. @vwxyzjn could you take a look if you have free time?

Denys88 avatar Nov 28 '22 23:11 Denys88

try changing the x axis to global_step on the top right (there is a button)

vwxyzjn avatar Nov 28 '22 23:11 vwxyzjn

Sorry for my delayed repsonse as well, @Denys88 and @vwxyzjn.

It looks like we can adjust the x-values here:

Screen Shot 2022-12-28 at 10 40 53 AM

So I think the intended usage here is that we are supposed to adjust rewards/time so that the x-axis has Wall Time and rewards/step so that it uses global_step? (Somewhat confusingly, rewards/iter seems fine with the normal Step though it is clear in the code that iter is supposed to refer to an epoch.)

Screen Shot 2022-12-28 at 10 42 04 AM

It would be nice if there was a way to automatically set all three plots so that they use the appropriate x-axis at the start. I'm not sure if this function is available.

If this is the intended usage, feel free to close this issue report. Thanks!

DanielTakeshi avatar Dec 28 '22 15:12 DanielTakeshi