super-ml-pets
super-ml-pets copied to clipboard
Strange behaviour after continuing training after crash
After Exception happens, for whatever reason, the ep_raw_mean
and ep_len_mean
are much higher than usual. Are we properly reseting the environment before restarting the training? Or is there a more important issue? Perhaps the opponents are reset to their poorest state, meaning that after restarting we are playing against easier opponents?
Note that after a while it goes down to a similar level that it was before the crash.
This is the prompt that i got around the Exception:
----------------------------------------
| rollout/ | |
| ep_len_mean | 72.6 |
| ep_rew_mean | 13.8 |
| time/ | |
| fps | 105 |
| iterations | 300 |
| time_elapsed | 5833 |
| total_timesteps | 614400 |
| train/ | |
| approx_kl | 0.06014028 |
| clip_fraction | 0.185 |
| clip_range | 0.2 |
| entropy_loss | -0.407 |
| explained_variance | 0.6 |
| learning_rate | 0.0003 |
| loss | 1.53 |
| n_updates | 48400 |
| policy_gradient_loss | -0.0404 |
| value_loss | 9.12 |
----------------------------------------
Exception: get_idx < pet-hedgehog 10-1 status-honey-bee 2-1 > not found
-----------------------------------------
| rollout/ | |
| ep_len_mean | 92.7 |
| ep_rew_mean | 33.1 |
| time/ | |
| fps | 123 |
| iterations | 1 |
| time_elapsed | 16 |
| total_timesteps | 2048 |
| train/ | |
| approx_kl | 0.043567862 |
| clip_fraction | 0.208 |
| clip_range | 0.2 |
| entropy_loss | -0.48 |
| explained_variance | 0.68 |
| learning_rate | 0.0003 |
| loss | 5.09 |
| n_updates | 48410 |
| policy_gradient_loss | -0.0425 |
| value_loss | 9.02 |
-----------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 83.9 |
| ep_rew_mean | 24.1 |
| time/ | |
| fps | 116 |
| iterations | 2 |
| time_elapsed | 35 |
| total_timesteps | 4096 |
| train/ | |
| approx_kl | 0.041116748 |
| clip_fraction | 0.179 |
| clip_range | 0.2 |
| entropy_loss | -0.366 |
| explained_variance | 0.631 |
| learning_rate | 0.0003 |
| loss | 5.58 |
| n_updates | 48420 |
| policy_gradient_loss | -0.0446 |
| value_loss | 16.3 |
-----------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 83 |
| ep_rew_mean | 21.3 |
| time/ | |
| fps | 114 |
| iterations | 3 |
| time_elapsed | 53 |
| total_timesteps | 6144 |
| train/ | |
| approx_kl | 0.056923926 |
| clip_fraction | 0.187 |
| clip_range | 0.2 |
| entropy_loss | -0.404 |
| explained_variance | 0.585 |
| learning_rate | 0.0003 |
| loss | 3.17 |
| n_updates | 48430 |
| policy_gradient_loss | -0.043 |
| value_loss | 12.3 |
-----------------------------------------
----------------------------------------
| rollout/ | |
| ep_len_mean | 83.6 |
| ep_rew_mean | 21.4 |
| time/ | |
| fps | 112 |
| iterations | 4 |
| time_elapsed | 72 |
| total_timesteps | 8192 |
| train/ | |
| approx_kl | 0.05702912 |
| clip_fraction | 0.185 |
| clip_range | 0.2 |
| entropy_loss | -0.42 |
| explained_variance | 0.476 |
| learning_rate | 0.0003 |
| loss | 3.65 |
| n_updates | 48440 |
| policy_gradient_loss | -0.0412 |
| value_loss | 14.4 |
----------------------------------------
---------------------------------------
| rollout/ | |
| ep_len_mean | 77.3 |
| ep_rew_mean | 16.2 |
| time/ | |
| fps | 112 |
| iterations | 5 |
| time_elapsed | 91 |
| total_timesteps | 10240 |
| train/ | |
| approx_kl | 0.0575137 |
| clip_fraction | 0.185 |
| clip_range | 0.2 |
| entropy_loss | -0.396 |
| explained_variance | 0.469 |
| learning_rate | 0.0003 |
| loss | 4.42 |
| n_updates | 48450 |
| policy_gradient_loss | -0.0437 |
| value_loss | 13.3 |
---------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 78.6 |
| ep_rew_mean | 17.3 |
| time/ | |
| fps | 112 |
| iterations | 6 |
| time_elapsed | 109 |
| total_timesteps | 12288 |
| train/ | |
| approx_kl | 0.077249065 |
| clip_fraction | 0.219 |
| clip_range | 0.2 |
| entropy_loss | -0.47 |
| explained_variance | 0.525 |
| learning_rate | 0.0003 |
| loss | 2.23 |
| n_updates | 48460 |
| policy_gradient_loss | -0.0489 |
| value_loss | 9.37 |
-----------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 78.3 |
| ep_rew_mean | 17.8 |
| time/ | |
| fps | 112 |
| iterations | 7 |
| time_elapsed | 127 |
| total_timesteps | 14336 |
| train/ | |
| approx_kl | 0.048610996 |
| clip_fraction | 0.182 |
| clip_range | 0.2 |
| entropy_loss | -0.408 |
| explained_variance | 0.645 |
| learning_rate | 0.0003 |
| loss | 2.42 |
| n_updates | 48470 |
| policy_gradient_loss | -0.041 |
| value_loss | 11.6 |
-----------------------------------------
----------------------------------------
| rollout/ | |
| ep_len_mean | 77.4 |
| ep_rew_mean | 17.6 |
| time/ | |
| fps | 111 |
| iterations | 8 |
| time_elapsed | 147 |
| total_timesteps | 16384 |
| train/ | |
| approx_kl | 0.04007852 |
| clip_fraction | 0.202 |
| clip_range | 0.2 |
| entropy_loss | -0.448 |
| explained_variance | 0.687 |
| learning_rate | 0.0003 |
| loss | 2.07 |
| n_updates | 48480 |
| policy_gradient_loss | -0.0456 |
| value_loss | 12 |
----------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 76.6 |
| ep_rew_mean | 17.3 |
| time/ | |
| fps | 111 |
| iterations | 9 |
| time_elapsed | 165 |
| total_timesteps | 18432 |
| train/ | |
| approx_kl | 0.056452066 |
| clip_fraction | 0.186 |
| clip_range | 0.2 |
| entropy_loss | -0.419 |
| explained_variance | 0.539 |
| learning_rate | 0.0003 |
| loss | 5.39 |
| n_updates | 48490 |
| policy_gradient_loss | -0.043 |
| value_loss | 17.7 |
-----------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 75.7 |
| ep_rew_mean | 17.1 |
| time/ | |
| fps | 110 |
| iterations | 10 |
| time_elapsed | 184 |
| total_timesteps | 20480 |
| train/ | |
| approx_kl | 0.043880884 |
| clip_fraction | 0.188 |
| clip_range | 0.2 |
| entropy_loss | -0.453 |
| explained_variance | 0.543 |
| learning_rate | 0.0003 |
| loss | 6.02 |
| n_updates | 48500 |
| policy_gradient_loss | -0.0444 |
| value_loss | 14.9 |
-----------------------------------------