super-ml-pets
super-ml-pets copied to clipboard
Training seem to crash occasionally
When training RL models using sapai-gym, different errors tend to occur.
I have tried to uses try-expect blocks, but the problem is that if this happens, training using standard baseline 3 crashes, and we will have to start all over again.
We should therefore either: 1) fix what is bugged in sapai/sapai-gym or 2) add a wrapper function that catches when this fails, and tries to generate a new one (if possible).
I've added a temporary fix for this, which essentially catches when this happens, and restarts training from the previous state, keeping all model history and whatnot.
Need a proper fix for this in sapai/sapai-gym.
As I assumed all errors were coming from sapai-gym, I added a fix to catch all errors happening there: https://github.com/andreped/sapai-gym/commit/7443f36944466316efbb5f0c35d91593cc7a50e5
However, to my surprise, when running a regular training (now without the try/except loop in the main training script train_agent.py, I got an error from within sb3. This is more challenging to solve. Not really sure what is causing it. See error prompt below after about 250k steps:
Traceback (most recent call last):
File ".\main.py", line 28, in <module>
train_with_masks(ret)
File "C:\Users\andrp\workspace\super-ml-pets\src\train_agent.py", line 60, in train_with_masks
model.learn(total_timesteps=ret.nb_steps, callback=checkpoint_callback)
File "C:\Users\andrp\workspace\super-ml-pets\venv38\lib\site-packages\sb3_contrib\ppo_mask\ppo_mask.py", line 579, in learn
self.train()
File "C:\Users\andrp\workspace\super-ml-pets\venv38\lib\site-packages\sb3_contrib\ppo_mask\ppo_mask.py", line 439, in train
values, log_prob, entropy = self.policy.evaluate_actions(
File "C:\Users\andrp\workspace\super-ml-pets\venv38\lib\site-packages\sb3_contrib\common\maskable\policies.py", line 280, in evaluate_actions
distribution.apply_masking(action_masks)
File "C:\Users\andrp\workspace\super-ml-pets\venv38\lib\site-packages\sb3_contrib\common\maskable\distributions.py", line 152, in apply_masking
self.distribution.apply_masking(masks)
File "C:\Users\andrp\workspace\super-ml-pets\venv38\lib\site-packages\sb3_contrib\common\maskable\distributions.py", line 62, in apply_masking
super().__init__(logits=logits)
File "C:\Users\andrp\workspace\super-ml-pets\venv38\lib\site-packages\torch\distributions\categorical.py", line 64, in __init__
super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
File "C:\Users\andrp\workspace\super-ml-pets\venv38\lib\site-packages\torch\distributions\distribution.py", line 55, in __init__
raise ValueError(
ValueError: Expected parameter probs (Tensor of shape (64, 213)) of distribution MaskableCategorical(probs: torch.Size([64, 213]), logits: torch.Size([64, 213])) to satisfy the constraint Simplex(), but found invalid values:
tensor([[4.9590e-11, 2.1976e-10, 6.1887e-01, ..., 3.3524e-13, 4.5890e-12,
5.3164e-14],
[1.4266e-06, 8.7648e-10, 1.3233e-06, ..., 1.5695e-07, 2.9451e-08,
1.5212e-07],
[2.2623e-06, 2.3994e-09, 5.3787e-07, ..., 3.9735e-08, 2.8777e-09,
2.6170e-08],
...,
[1.6828e-12, 4.9032e-04, 9.5983e-13, ..., 1.7402e-13, 1.9223e-13,
5.6725e-14],
[4.7819e-10, 7.7589e-03, 7.8509e-18, ..., 6.4911e-11, 8.8994e-12,
8.3013e-11],
[3.6789e-08, 1.2760e-07, 4.7924e-16, ..., 8.6682e-09, 8.6489e-10,
3.7913e-08]], grad_fn=<SoftmaxBackward0>)
Random Exception seem to happen after training thousands of steps:
Exception: get_idx < pet-hedgehog 10-1 status-honey-bee 2-1 > not found
What is causing this?