Epoch: [1][20577/20578] Time 0.761 Data 0.001 Loss 136.7847
Epoch: [1][20578/20578] Time 0.761 Data 0.001 Loss 157.6949
Traceback (most recent call last):
File "train.py", line 401, in
trainer.run(train_loader, args.config.training.num_epochs)
File "/usr/local/lib/python3.6/dist-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 326, in run
File "/usr/local/lib/python3.6/dist-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 291, in _handle_exception
File "/usr/local/lib/python3.6/dist-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 317, in run
File "/usr/local/lib/python3.6/dist-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 226, in _fire_event
File "train.py", line 269, in log_epoch
evaluator.run(train_loader)
File "/usr/local/lib/python3.6/dist-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 326, in run
File "/usr/local/lib/python3.6/dist-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 291, in _handle_exception
File "/usr/local/lib/python3.6/dist-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 313, in run
File "/usr/local/lib/python3.6/dist-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 280, in _run_once_on_dataset
File "/usr/local/lib/python3.6/dist-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 291, in _handle_exception
File "/usr/local/lib/python3.6/dist-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 273, in _run_once_on_dataset
File "/usr/local/lib/python3.6/dist-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 226, in _fire_event
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 43, in decorate_no_grad
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/metrics/metric.py", line 65, in iteration_completed
File "/home/navar/savoz/aes-lac-2018/codes/metrics.py", line 17, in update
self.metrics[i].update(output)
File "/home/navar/savoz/aes-lac-2018/codes/metrics.py", line 46, in update
out, targets, out_sizes, target_sizes = output
ValueError: too many values to unpack (expected 4)
A turn-around, not a solution:
I just realized the --checkpoint argument is not working. I am using --checkpoint-per-batch instead.
The model saves fine at the end of each batch, and the code brokes at the end of the training.