muzic
muzic copied to clipboard
[MusicBERT] Dataloader bug during pretraining
Hi, I preprocessed the raw dataset following the readme but found a bug during pretraining. There is the full stack trace:
Traceback (most recent call last):
File "/usr/local/python3/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/data/user/muzic/musicbert/fairseq/fairseq/distributed_utils.py", line 302, in distributed_main
main(cfg, **kwargs)
File "/data/user/muzic/musicbert/fairseq/fairseq_cli/train.py", line 137, in main
valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
File "/usr/local/python3/lib/python3.6/contextlib.py", line 52, in inner
return func(*args, **kwds)
File "/data/user/muzic/musicbert/fairseq/fairseq_cli/train.py", line 233, in train
for i, samples in enumerate(progress):
File "/data/user/muzic/musicbert/fairseq/fairseq/logging/progress_bar.py", line 256, in __iter__
for i, obj in enumerate(self.iterable, start=self.n):
File "/data/user/muzic/musicbert/fairseq/fairseq/data/iterators.py", line 59, in __iter__
for x in self.iterable:
File "/data/user/muzic/musicbert/fairseq/fairseq/data/iterators.py", line 473, in _chunk_iterator
for x in itr:
File "/data/user/muzic/musicbert/fairseq/fairseq/data/iterators.py", line 59, in __iter__
for x in self.iterable:
File "/data/user/muzic/musicbert/fairseq/fairseq/data/iterators.py", line 595, in __next__
raise item
File "/data/user/muzic/musicbert/fairseq/fairseq/data/iterators.py", line 526, in run
for item in self._source:
File "/usr/local/python3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
data = self._next_data()
File "/usr/local/python3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/usr/local/python3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/usr/local/python3/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/python3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/python3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/python3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/data/user/muzic/musicbert/fairseq/fairseq/data/base_wrapper_dataset.py", line 17, in __getitem__
return self.dataset[index]
File "/data/user/muzic/musicbert/fairseq/fairseq/data/nested_dictionary_dataset.py", line 70, in __getitem__
return OrderedDict((k, ds[index]) for k, ds in self.defn.items())
File "/data/user/muzic/musicbert/fairseq/fairseq/data/nested_dictionary_dataset.py", line 70, in <genexpr>
return OrderedDict((k, ds[index]) for k, ds in self.defn.items())
File "/data/user/muzic/musicbert/fairseq/fairseq/data/base_wrapper_dataset.py", line 17, in __getitem__
return self.dataset[index]
File "/data/user/muzic/musicbert/fairseq/fairseq/data/lru_cache_dataset.py", line 17, in __getitem__
return self.dataset[index]
File "/data/user/muzic/musicbert/musicbert/__init__.py", line 219, in __getitem__
((item[8: -8: 8] - 4) * max_instruments) + (item[8 + 2: -8 + 2: 8] - 4)].flatten()
ValueError: could not broadcast input array from shape (752) into shape (747)
My environment:
torch 1.7.0
cuda 10
fairseq git version: 336942734c85791a90baa373c212d27e7c722662
Note that I enabled the --fp16
flag to speedup the training. This error is confusing. If convenient, can you provide a copy of your preprocessed-data-bin?
Hi @SkyAndCloud
It seems that the length of the token sequence is not a multiple of 8. Did you change some parameters in the scripts?
The error happened in the data loader, so it was probably not caused by --fp16
.
This is the archive of the preprocessed dataset lmd_data_bin
.
@mlzeng No, I didn't change the parameters.