MaskablePPO doesn't set _num_timesteps_at_start so fps count is wrong when reset_num_timesteps=False in learn()

Open Naton1 opened this issue 2 years ago • 1 comments

Describe the bug MaskablePPO calculates FPS here, which checks self._num_timesteps_at_start. However MaskablePPO overrides _setup_learn which is where this gets updated (in BaseAlgorithm here), so it never updates this field. This means it's always 0, so it's incorrect when reset_num_timestamps=False.

Code example N/A (the issue is present but looking at the code I linked above)

Just passing reset_num_timesteps=False into MaskablePPO.learn() will log incorrect fps after the first call to learn

System Info

- OS: Windows-10-10.0.19044-SP0 10.0.19044
- Python: 3.9.16
- Stable-Baselines3: 1.7.0
- PyTorch: 1.13.1+cpu
- GPU Enabled: False
- Numpy: 1.24.2
- Gym: 0.21.0

Additional context For ex. here's my tensorboard graph when using the following code:

while True:
    ppo.learn(total_timesteps=1000000,
              tb_log_name='agent',
              reset_num_timesteps=False,
              callback=checkpoint_callback)

As you can see, FPS shot up right after the first 1000000 timesteps, when learn was called again.

Mar 04 '23 05:03 Naton1

Hello, it is indeed missing the fix we did a while ago for all other algos: https://github.com/DLR-RM/stable-baselines3/blob/f0382a25bdbed62dcc31875d3e540ad95c1575a5/stable_baselines3/common/base_class.py#L404

i would welcome a PR that fixes this issue =)

Mar 04 '23 10:03 araffin