Hi everybody,
After trying to train a new vq-vae with my own .wavs on a local gtx 1080 gpu I got this error. Any help would be appreciated.
Traceback (most recent call last):
File "jukebox/train.py", line 336, in
fire.Fire(run)
File "/home/monistan/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/monistan/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/monistan/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "jukebox/train.py", line 319, in run
train_metrics = train(distributed_model, model, opt, shd, scalar, ema, logger, metrics, data_processor, hps)
File "jukebox/train.py", line 204, in train
for i, x in logger.get_range(data_processor.train_loader):
File "/home/monistan/anaconda3/envs/jukebox/lib/python3.7/site-packages/tqdm/std.py", line 1127, in iter
for obj in iterable:
File "/home/monistan/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/home/monistan/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/monistan/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/monistan/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/monistan/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/monistan/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/monistan/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/monistan/jukebox/jukebox/data/data_processor.py", line 22, in getitem
return self.dataset.get_item(self.start + item, test=self.test)
File "/home/monistan/jukebox/jukebox/data/files_dataset.py", line 90, in get_item
return self.get_song_chunk(index, offset, test)
File "/home/monistan/jukebox/jukebox/data/files_dataset.py", line 79, in get_song_chunk
data, sr = load_audio(filename, sr=self.sr, offset=offset, duration=self.sample_length)
File "/home/monistan/jukebox/jukebox/utils/io.py", line 48, in load_audio
frame = frame.to_ndarray(format='fltp') # Convert to floats and not int16
AttributeError: 'NoneType' object has no attribute 'to_ndarray'
Huh, we've never seen this error before. It looks like we're using av version 8.0.2. Could you try a similar version? It'd be useful to make sure no audio files are corrupt
I have got the same error when I trained the prior with over 30 wav files.
I have the same problem as well. I try to train a vq-vae using a few .wav files.
I'm using av version 8.0.1, I'll try using 8.0.2 and let you know.
EDIT: av 8.0.2 doesn't fix the problem.
Any suggestions?
I'm using the following arguments:
!mpiexec -n 1 python jukebox/train.py --hps=small_vqvae --name=small_vqvae --sample_length=8 --bs=1 \ --audio_files_dir='/content/drive/My Drive/jukebox/dataset' --labels=False --train --aug_shift --aug_blend
for some reason, sample_length wouldn't work with value above 8, and would throw an 'AssertionError: Midpoint X of item beyond total length Y'
Got it to run a little bit further by setting sr=22050 in hparams.py, as it seems the samplerate was 22048. But it still stops with the same error, after some hours.
Yes, that seemed to do the trick (setting the sr
hyperparameter to the sample rate of the wav files). Should have checked better to see that the sample rates wouldn't match.
Now to fix the midpoint beyond total length error, haha. Any ideas on that? I've already set sample_length
to 8... I think lowering it any more would not do any good.
I have the same problem. Did anyone solve the problem and successfully train a new vavqe?
I changed sr to 44100 and it worked