RAVE
RAVE copied to clipboard
ValueError when resuming from previous training checkpoint
Hey RAVE team, I'm repeatedly getting a similar error whenever attempting to resume a training job that was cancelled after 24 hours on a remote training server. The error happens on the "validation_epoch_end" hook and is ValueError: not enough values to unpack (expected 2, got 0)
(see full stacktrace below).
Epoch 24: 0%| | 0/19333 [00:00<00:00, -25206153.85it/s]
Traceback (most recent call last):
File "/jmain02/home/RAVE/train_rave.py", line 175, in <module>
trainer.fit(model, train, val, ckpt_path=run)
File "/jmain02/home/.conda/envs/rave/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
self._call_and_handle_interrupt(
File "/jmain02/home/.conda/envs/rave/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/jmain02/home/.conda/envs/rave/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/jmain02/home/.conda/envs/rave/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
results = self._run_stage()
File "/jmain02/home/.conda/envs/rave/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
return self._run_train()
File "/jmain02/home/.conda/envs/rave/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1353, in _run_train
self.fit_loop.run()
File "/jmain02/home/.conda/envs/rave/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/jmain02/home/.conda/envs/rave/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/jmain02/home/.conda/envs/rave/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 205, in run
self.on_advance_end()
File "/jmain02/home/.conda/envs/rave/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 255, in on_advance_end
self._run_validation()
File "/jmain02/home/.conda/envs/rave/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 309, in _run_validation
self.val_loop.run()
File "/jmain02/home/.conda/envs/rave/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 211, in run
output = self.on_run_end()
File "/jmain02/home/.conda/envs/rave/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 188, in on_run_end
self._evaluation_epoch_end(self._outputs)
File "/jmain02/home/.conda/envs/rave/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 315, in _evaluation_epoch_end
self.trainer._call_lightning_module_hook("validation_epoch_end", output_or_outputs)
File "/jmain02/home/.conda/envs/rave/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1595, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/jmain02/home/RAVE/rave/model.py", line 708, in validation_epoch_end
audio, z = list(zip(*out))
ValueError: not enough values to unpack (expected 2, got 0)
What version was your RAVE? Did you try with 2.3?