super-ml-pets icon indicating copy to clipboard operation
super-ml-pets copied to clipboard

Strange behaviour after continuing training after crash

Open andreped opened this issue 1 year ago • 2 comments

After Exception happens, for whatever reason, the ep_raw_mean and ep_len_mean are much higher than usual. Are we properly reseting the environment before restarting the training? Or is there a more important issue? Perhaps the opponents are reset to their poorest state, meaning that after restarting we are playing against easier opponents?

Note that after a while it goes down to a similar level that it was before the crash.

This is the prompt that i got around the Exception:

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 72.6       |
|    ep_rew_mean          | 13.8       |
| time/                   |            |
|    fps                  | 105        |
|    iterations           | 300        |
|    time_elapsed         | 5833       |
|    total_timesteps      | 614400     |
| train/                  |            |
|    approx_kl            | 0.06014028 |
|    clip_fraction        | 0.185      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.407     |
|    explained_variance   | 0.6        |
|    learning_rate        | 0.0003     |
|    loss                 | 1.53       |
|    n_updates            | 48400      |
|    policy_gradient_loss | -0.0404    |
|    value_loss           | 9.12       |
----------------------------------------
Exception: get_idx < pet-hedgehog 10-1 status-honey-bee 2-1 > not found
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 92.7        |
|    ep_rew_mean          | 33.1        |
| time/                   |             |
|    fps                  | 123         |
|    iterations           | 1           |
|    time_elapsed         | 16          |
|    total_timesteps      | 2048        |
| train/                  |             |
|    approx_kl            | 0.043567862 |
|    clip_fraction        | 0.208       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.48       |
|    explained_variance   | 0.68        |
|    learning_rate        | 0.0003      |
|    loss                 | 5.09        |
|    n_updates            | 48410       |
|    policy_gradient_loss | -0.0425     |
|    value_loss           | 9.02        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 83.9        |
|    ep_rew_mean          | 24.1        |
| time/                   |             |
|    fps                  | 116         |
|    iterations           | 2           |
|    time_elapsed         | 35          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.041116748 |
|    clip_fraction        | 0.179       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.366      |
|    explained_variance   | 0.631       |
|    learning_rate        | 0.0003      |
|    loss                 | 5.58        |
|    n_updates            | 48420       |
|    policy_gradient_loss | -0.0446     |
|    value_loss           | 16.3        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 83          |
|    ep_rew_mean          | 21.3        |
| time/                   |             |
|    fps                  | 114         |
|    iterations           | 3           |
|    time_elapsed         | 53          |
|    total_timesteps      | 6144        |
| train/                  |             |
|    approx_kl            | 0.056923926 |
|    clip_fraction        | 0.187       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.404      |
|    explained_variance   | 0.585       |
|    learning_rate        | 0.0003      |
|    loss                 | 3.17        |
|    n_updates            | 48430       |
|    policy_gradient_loss | -0.043      |
|    value_loss           | 12.3        |
-----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 83.6       |
|    ep_rew_mean          | 21.4       |
| time/                   |            |
|    fps                  | 112        |
|    iterations           | 4          |
|    time_elapsed         | 72         |
|    total_timesteps      | 8192       |
| train/                  |            |
|    approx_kl            | 0.05702912 |
|    clip_fraction        | 0.185      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.42      |
|    explained_variance   | 0.476      |
|    learning_rate        | 0.0003     |
|    loss                 | 3.65       |
|    n_updates            | 48440      |
|    policy_gradient_loss | -0.0412    |
|    value_loss           | 14.4       |
----------------------------------------
---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 77.3      |
|    ep_rew_mean          | 16.2      |
| time/                   |           |
|    fps                  | 112       |
|    iterations           | 5         |
|    time_elapsed         | 91        |
|    total_timesteps      | 10240     |
| train/                  |           |
|    approx_kl            | 0.0575137 |
|    clip_fraction        | 0.185     |
|    clip_range           | 0.2       |
|    entropy_loss         | -0.396    |
|    explained_variance   | 0.469     |
|    learning_rate        | 0.0003    |
|    loss                 | 4.42      |
|    n_updates            | 48450     |
|    policy_gradient_loss | -0.0437   |
|    value_loss           | 13.3      |
---------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 78.6        |
|    ep_rew_mean          | 17.3        |
| time/                   |             |
|    fps                  | 112         |
|    iterations           | 6           |
|    time_elapsed         | 109         |
|    total_timesteps      | 12288       |
| train/                  |             |
|    approx_kl            | 0.077249065 |
|    clip_fraction        | 0.219       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.47       |
|    explained_variance   | 0.525       |
|    learning_rate        | 0.0003      |
|    loss                 | 2.23        |
|    n_updates            | 48460       |
|    policy_gradient_loss | -0.0489     |
|    value_loss           | 9.37        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 78.3        |
|    ep_rew_mean          | 17.8        |
| time/                   |             |
|    fps                  | 112         |
|    iterations           | 7           |
|    time_elapsed         | 127         |
|    total_timesteps      | 14336       |
| train/                  |             |
|    approx_kl            | 0.048610996 |
|    clip_fraction        | 0.182       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.408      |
|    explained_variance   | 0.645       |
|    learning_rate        | 0.0003      |
|    loss                 | 2.42        |
|    n_updates            | 48470       |
|    policy_gradient_loss | -0.041      |
|    value_loss           | 11.6        |
-----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 77.4       |
|    ep_rew_mean          | 17.6       |
| time/                   |            |
|    fps                  | 111        |
|    iterations           | 8          |
|    time_elapsed         | 147        |
|    total_timesteps      | 16384      |
| train/                  |            |
|    approx_kl            | 0.04007852 |
|    clip_fraction        | 0.202      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.448     |
|    explained_variance   | 0.687      |
|    learning_rate        | 0.0003     |
|    loss                 | 2.07       |
|    n_updates            | 48480      |
|    policy_gradient_loss | -0.0456    |
|    value_loss           | 12         |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 76.6        |
|    ep_rew_mean          | 17.3        |
| time/                   |             |
|    fps                  | 111         |
|    iterations           | 9           |
|    time_elapsed         | 165         |
|    total_timesteps      | 18432       |
| train/                  |             |
|    approx_kl            | 0.056452066 |
|    clip_fraction        | 0.186       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.419      |
|    explained_variance   | 0.539       |
|    learning_rate        | 0.0003      |
|    loss                 | 5.39        |
|    n_updates            | 48490       |
|    policy_gradient_loss | -0.043      |
|    value_loss           | 17.7        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 75.7        |
|    ep_rew_mean          | 17.1        |
| time/                   |             |
|    fps                  | 110         |
|    iterations           | 10          |
|    time_elapsed         | 184         |
|    total_timesteps      | 20480       |
| train/                  |             |
|    approx_kl            | 0.043880884 |
|    clip_fraction        | 0.188       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.453      |
|    explained_variance   | 0.543       |
|    learning_rate        | 0.0003      |
|    loss                 | 6.02        |
|    n_updates            | 48500       |
|    policy_gradient_loss | -0.0444     |
|    value_loss           | 14.9        |
-----------------------------------------

andreped avatar Aug 21 '22 10:08 andreped