jukebox-windows
jukebox-windows copied to clipboard
Level 0 generation fails to generate past input audio
I thought maybe the first time I ran this it could've just been an error on my machine but it happened twice.
I tried to run my own primed audio with this software with the following options --model=5b_lyrics --name=sample_5b_prompted levels=3 --mode=primed --audio_file=myfullsong.wav --prompt_length_in_seconds=12 --sample_length_in_seconds 90 --total_sample_length_in_seconds=193 --sr=44100 --n_samples 1 --hop_fractions=0.5,0.5,0.125
When it reaches Sampling level 0
, it seems to exit without any sampling actually happening. Both Level 2 and Level 1 have sampling but level0 does not and only generates the prompt length, nothing further.
I noticed that when running the default 20 second sample length, it would generate 30 seconds for both level 2 and level 1 but generate 19 for level0. Could this be related?
yeah, the same issue :(
my command is
python jukebox/sample.py --model=1b_lyrics --name=sample_5b --levels=3 --sample_length_in_seconds=180 --total_sample_length_in_seconds=180 --sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125
i've noticed that it works on a small length, but seems not for >60
Traceback (most recent call last): File "jukebox/sample.py", line 270, in <module> fire.Fire(run) File "D:\Anaconda3\lib\site-packages\fire\core.py", line 127, in Fire component_trace = _Fire(component, args, context, name) File "D:\Anaconda3\lib\site-packages\fire\core.py", line 366, in _Fire component, remaining_args) File "D:\Anaconda3\lib\site-packages\fire\core.py", line 542, in _CallCallable result = fn(*varargs, **kwargs) File "jukebox/sample.py", line 267, in run save_samples(model, device, hps, sample_hps) File "jukebox/sample.py", line 235, in save_samples ancestral_sample(labels, sampling_kwargs, priors, hps) File "jukebox/sample.py", line 130, in ancestral_sample zs = _sample(zs, labels, sampling_kwargs, priors, sample_levels, hps) File "jukebox/sample.py", line 114, in _sample x = prior.decode(zs[level:], start_level=level, bs_chunks=zs[level].shape[0]) File "D:\Developer\Python\audio\jukebox-master\jukebox\prior\prior.py", line 221, in decode x_out = self.decoder(zs, start_level=start_level, end_level=end_level, bs_chunks=bs_chunks) File "D:\Developer\Python\audio\jukebox-master\jukebox\vqvae\vqvae.py", line 118, in decode x_out = self._decode(zs_i, start_level=start_level, end_level=end_level) File "D:\Developer\Python\audio\jukebox-master\jukebox\vqvae\vqvae.py", line 109, in _decode x_out = decoder(x_quantised, all_levels=False) File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "D:\Developer\Python\audio\jukebox-master\jukebox\vqvae\encdec.py", line 124, in forward x = level_block(x) File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "D:\Developer\Python\audio\jukebox-master\jukebox\vqvae\encdec.py", line 46, in forward return self.model(x) File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "D:\Anaconda3\lib\site-packages\torch\nn\modules\container.py", line 100, in forward input = module(input) File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "D:\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 202, in forward self.padding, self.dilation, self.groups) RuntimeError: Calculated padded input size per channel: (2). Kernel size: (3). Kernel size can't be greater than actual input size
I figured this out! It's actually because in 'make_model.py' line 142: "rescale = lambda z_shape: (z_shape[0]*hps.n_ctx//vqvae.z_shapes[hps.level][0],)" z_shape[0]*hps.n_ctx is larger than int32 (2,147,483,647) and become a negative number when the 'sample_length_in_seconds' is over 54s 【Solution】I changed line 53 in 'vqvae.py' from "self.hop_lengths = np.cumprod(self.downsamples)" to "self.hop_lengths = np.cumprod(self.downsamples, dtype=np.int64)" This works for me! Just use int64 type for z_shapes