MaskablePPO doesn't set _num_timesteps_at_start so fps count is wrong when reset_num_timesteps=False in learn()
Describe the bug
MaskablePPO calculates FPS here, which checks self._num_timesteps_at_start. However MaskablePPO overrides _setup_learn which is where this gets updated (in BaseAlgorithm here), so it never updates this field. This means it's always 0, so it's incorrect when reset_num_timestamps=False.
Code example N/A (the issue is present but looking at the code I linked above)
Just passing reset_num_timesteps=False into MaskablePPO.learn() will log incorrect fps after the first call to learn
System Info
- OS: Windows-10-10.0.19044-SP0 10.0.19044
- Python: 3.9.16
- Stable-Baselines3: 1.7.0
- PyTorch: 1.13.1+cpu
- GPU Enabled: False
- Numpy: 1.24.2
- Gym: 0.21.0
Additional context For ex. here's my tensorboard graph when using the following code:
while True:
ppo.learn(total_timesteps=1000000,
tb_log_name='agent',
reset_num_timesteps=False,
callback=checkpoint_callback)

As you can see, FPS shot up right after the first 1000000 timesteps, when learn was called again.
Hello, it is indeed missing the fix we did a while ago for all other algos: https://github.com/DLR-RM/stable-baselines3/blob/f0382a25bdbed62dcc31875d3e540ad95c1575a5/stable_baselines3/common/base_class.py#L404
i would welcome a PR that fixes this issue =)