stable-baselines3 icon indicating copy to clipboard operation
stable-baselines3 copied to clipboard

[Bug]: Iteration not updated in locals while learning

Open ericrwp opened this issue 1 year ago • 1 comments

🐛 Bug

In the method stable_baselines3.common.on_policy_algorithm.OnPolicyAlgorithm.learn the iteration value is not updated in the locals dictionary while using callbacks.

To Reproduce

from stable_baselines3 import PPO

def callback_function(v_locals, v_globals):
    iteration_index = v_locals['iteration']
    print(f'iteration_index={iteration_index}')

    return True

checkpoint_callback = ConvertCallback(lambda x, y: callback_function(x, y))

model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10_000, callback=[checkpoint_callback])

Relevant log output / Error message

iteration_index=0
iteration_index=0
iteration_index=0
----------------------------------------
| time/                   |            |
|    fps                  | 69         |
|    iterations           | 3          |
|    time_elapsed         | 87         |
|    total_timesteps      | 6144       |
| train/                  |            |
|    approx_kl            | 0.00776526 |
|    clip_fraction        | 0.0327     |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.37      |
|    explained_variance   | -0.000442  |
|    learning_rate        | 0.0003     |
|    loss                 | 3.45e+04   |
|    n_updates            | 20         |
|    policy_gradient_loss | -0.0103    |
|    value_loss           | 6.32e+04   |
----------------------------------------
iteration_index=0
iteration_index=0
iteration_index=0

System Info

  • OS: Windows-11-10.0.22631-SP0 10.0.22631
  • Python: 3.12.0
  • Stable-Baselines3: 2.2.1
  • PyTorch: 2.2.0+cpu
  • GPU Enabled: False
  • Numpy: 1.26.3
  • Cloudpickle: 3.0.0
  • Gymnasium: 0.29.1

Checklist

  • [X] My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
  • [X] I have checked that there is no similar issue in the repo
  • [X] I have read the documentation
  • [X] I have provided a minimal and working example to reproduce the bug
  • [X] I've used the markdown code blocks for both code and stack traces.

ericrwp avatar Feb 07 '24 09:02 ericrwp

Hello, if you want to have the number of iterations, the best is to have a counter that you increment as every call of on_rollout_end().

araffin avatar Feb 07 '24 16:02 araffin