RAVE icon indicating copy to clipboard operation
RAVE copied to clipboard

Colab tensor size mismatch issue

Open jonesmo opened this issue 2 years ago • 6 comments

Training RAVE for the first time in Colab. The cells all run successfully through resampling, but when I launch the training step, I get the following error: RuntimeError: The size of tensor a (129) must match the size of tensor b (133) at non-singleton dimension 2

I've pointed it to a directory with a total of 8.25 hours of varied-length, 16k audio files, and the parameters I'm using are sampling_rate=16000, multiband_number=16, n_signal=65538, size=default, prior=32.

Here's the full stack trace:

Sanity Checking: 0it [00:00, ?it/s]/content/miniconda/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
Sanity Checking DataLoader 0:   0% 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/content/drive/MyDrive/RAVE_COLLAB/train_rave.py", line 175, in <module>
    trainer.fit(model, train, val, ckpt_path=run)
  File "/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
    self._call_and_handle_interrupt(
  File "/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
    results = self._run_stage()
  File "/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
    return self._run_train()
  File "/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1345, in _run_train
    self._run_sanity_check()
  File "/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1413, in _run_sanity_check
    val_loop.run()
  File "/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 154, in advance
    dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
  File "/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 128, in advance
    output = self._evaluation_step(**kwargs)
  File "/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 226, in _evaluation_step
    output = self.trainer._call_strategy_hook("validation_step", *kwargs.values())
  File "/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1765, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 344, in validation_step
    return self.model.validation_step(*args, **kwargs)
  File "/content/drive/MyDrive/RAVE_COLLAB/rave/model.py", line 700, in validation_step
    distance = self.distance(x, y)
  File "/content/drive/MyDrive/RAVE_COLLAB/rave/model.py", line 513, in distance
    lin = sum(list(map(self.lin_distance, x, y)))
  File "/content/drive/MyDrive/RAVE_COLLAB/rave/model.py", line 503, in lin_distance
    return torch.norm(x - y) / torch.norm(x)
RuntimeError: The size of tensor a (129) must match the size of tensor b (133) at non-singleton dimension 2

Thanks for any insight or help!

jonesmo avatar Jan 17 '23 00:01 jonesmo

I meets the same problem. Have you resolved it?

gandolfxu avatar Mar 08 '23 09:03 gandolfxu

I meets the same problem. Have you resolved it?

Unfortunately no

jonesmo avatar Mar 08 '23 15:03 jonesmo

Getting the same issue on the latest version.

arjunbahuguna avatar Apr 10 '24 08:04 arjunbahuguna

Same problem with training on local machine

berkut0 avatar May 03 '24 12:05 berkut0

+1 Same on version 2.3.1

patrickgates avatar May 04 '24 17:05 patrickgates

I had the same issue with the latest version and solved it by modifying rave/model.py (see #309 )

gwendal-lv avatar May 28 '24 18:05 gwendal-lv