pase icon indicating copy to clipboard operation
pase copied to clipboard

Audio buffer and Padding size problems

Open teinhonglo opened this issue 4 years ago • 5 comments

Hi, I am trying to train pase model from scratch and I get the following two errors, Audio buffer is not finite everywhere and Padding size should be less than the corresponding input dimension , while training the model. To fix the first problem, I tried to add np.nan_to_num(y) before the 706th, but I think this trial is not a good solution. I have no idea to two problems. Any suggestion?

Audio buffer is not finite everywhere

Traceback (most recent call last): File "train.py", line 465, in train(opts) File "train.py", line 333, in train Trainer.train_(dloader, device=device, valid_dataloader=va_dloader) File "/home/teinhonglo/pase/pase/models/WorkerScheduler/trainer.py", line 223, in train_ batch = next(iterator) File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data return self._process_data(data) File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise() File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/_utils.py", line 394, in reraise raise self.exc_type(msg) librosa.util.exceptions.ParameterError: Caught ParameterError in DataLoader worker process 8. Original Traceback (most recent call last): File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/teinhonglo/pase/pase/dataset.py", line 492, in getitem pkg = self.transform(pkg) File "/usr/local/bin/.local/lib/python3.5/site-packages/torchvision/transforms/transforms.py", line 70, in call img = t(img) File "/home/teinhonglo/pase/pase/transforms.py", line 706, in call hop_length=self.hop, File "/usr/local/bin/.local/lib/python3.5/site-packages/librosa/feature/spectral.py", line 1442, in mfcc S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs)) File "/usr/local/bin/.local/lib/python3.5/site-packages/librosa/feature/spectral.py", line 1531, in melspectrogram power=power) File "/usr/local/bin/.local/lib/python3.5/site-packages/librosa/core/spectrum.py", line 1557, in _spectrogram S = np.abs(stft(y, n_fft=n_fft, hop_length=hop_length))**power File "/usr/local/bin/.local/lib/python3.5/site-packages/librosa/core/spectrum.py", line 161, in stft util.valid_audio(y) File "/usr/local/bin/.local/lib/python3.5/site-packages/librosa/util/utils.py", line 170, in valid_audio raise ParameterError('Audio buffer is not finite everywhere') librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere

Padding size should be less than the corresponding input dimension

Epoch 0/10: 5%|#####3 | 242/5205 [05:53<3:40:07, 2.66s/it] Traceback (most recent call last): File "train.py", line 465, in train(opts) File "train.py", line 333, in train Trainer.train_(dloader, device=device, valid_dataloader=va_dloader) File "/home/teinhonglo/pase/pase/models/WorkerScheduler/trainer.py", line 223, in train_ batch = next(iterator) File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data return self._process_data(data) File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise() File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/_utils.py", line 394, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in DataLoader worker process 2. Original Traceback (most recent call last): File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/teinhonglo/.local/lib/python3.5/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/teinhonglo/pase/pase/dataset.py", line 492, in getitem pkg = self.transform(pkg) File "/usr/local/bin/.local/lib/python3.5/site-packages/torchvision/transforms/transforms.py", line 70, in call img = t(img) File "/home/teinhonglo/pase/pase/transforms.py", line 427, in call pkg['chunk_rand'] = self.select_chunk(raw_rand) File "/home/teinhonglo/pase/pase/transforms.py", line 317, in select_chunk mode=self.pad_mode).view(-1) File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/nn/functional.py", line 2868, in pad return torch._C._nn.reflection_pad1d(input, pad) RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (0, 28656) at dimension 2 of input [1, 1, 3344]

teinhonglo avatar Mar 11 '20 01:03 teinhonglo

Hey did you segment your data? I think i got a similar error when I didnt

MittalShruti avatar Mar 11 '20 06:03 MittalShruti

Hey did you segment your data? I think i got a similar error when I didnt

Did you mean that segment data to train/valid/test set? I write train/valid to train.scp, test to test.scp and I create a symbolic links to train/valid/test wavs in data/wavs.

teinhonglo avatar Mar 12 '20 09:03 teinhonglo

No, check the script at /data/prep/prepare_segmented_dataset_libri.py

MittalShruti avatar Mar 13 '20 09:03 MittalShruti

For me using soundfile.read instead of torchaudio.load solved the issue with paddings (I used .ogg files, not wavs)

pollytur avatar Jan 12 '21 11:01 pollytur

Hello! I have been replicating this experiment recently, but during the process of making the dataset config file, do I know where to obtain these files. (-- train_scp data/LibriSpeed/libri_tr.scp -- test_scp data/LibriSpeed/libri_te.scp\

--Libri_ Dict data/LibriSpeed/Libri_ Dict. npy). I look forward to your reply very much. Thank you.

uuwz avatar Sep 07 '23 12:09 uuwz