jukebox icon indicating copy to clipboard operation
jukebox copied to clipboard

Model seems to be corrupted

Open DHOFM opened this issue 5 years ago • 10 comments

Hi, this worked until yesterday, but today there is a problem downloading the model, tested under Colab and Kaggle Kernel:

vqvae = make_vqvae(setup_hparams(vqvae, dict(sample_length = 1048576)), device)

leads to


`---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-3-764e644c2a7c> in <module>()
     10 
     11 vqvae, *priors = MODELS[model]
---> 12 vqvae = make_vqvae(setup_hparams(vqvae, dict(sample_length = 1048576)), device)
     13 top_prior = make_prior(setup_hparams(priors[-1], dict()), vqvae, device)
     14 

5 frames
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in __init__(self, name, mode)
    209 class _open_file(_opener):
    210     def __init__(self, name, mode):
--> 211         super(_open_file, self).__init__(open(name, mode))
    212 
    213     def __exit__(self, *args):

FileNotFoundError: [Errno 2] No such file or directory: `

wget https://openaipublic.blob.core.windows.net/jukebox/models/5b/vqvae.pth.tar

Works but cannot be unpacked:

tar: This does not look like a tar archive

Kind regards,

Dirk

DHOFM avatar Nov 13 '20 14:11 DHOFM

I am getting this same error since last night.

Thank so much in advance if someone can fix or suggest a resolution!!

-J

Edit:

Perhaps this is helpful: https://github.com/openai/jukebox/issues/173

CCpt5 avatar Nov 13 '20 17:11 CCpt5

Short answer: Change !pip install git+https://github.com/openai/jukebox.git to this !pip install git+https://github.com/tdunity/fixedjukebox.git Long answer: OpenAI has likely done some things to their jukebox AI to make it not function on google colab. Maybe they trained it more or they improved it & added more songs or maybe they just made a better upsampling procces or maybe it is more optimized.

Randy-H0 avatar Nov 13 '20 19:11 Randy-H0

Just pushed a fix, can you check again if it works now?

prafullasd avatar Nov 13 '20 20:11 prafullasd

Seems to get stuck here (been running for 10min / will leave it and update if it changes):

Downloading from azure
Running  wget -O /root/.cache/jukebox/models/5b/vqvae.pth.tar https://openaipublic.blob.core.windows.net/jukebox/models/5b/vqvae.pth.tar
Restored from /root/.cache/jukebox/models/5b/vqvae.pth.tar
0: Loading vqvae in eval mode
Loading artist IDs from /usr/local/lib/python3.6/dist-packages/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /usr/local/lib/python3.6/dist-packages/jukebox/data/ids/v2_genre_ids.txt
Level:2, Cond downsample:None, Raw to tokens:128, Sample length:1048576
0: Converting to fp16 params
Downloading from azure
Running  wget -O /root/.cache/jukebox/models/5b_lyrics/prior_level_2.pth.tar https://openaipublic.blob.core.windows.net/jukebox/models/5b_lyrics/prior_level_2.pth.tar

(This is using colab w/ the checkpoint variation of the script if that matters: https://colab.research.google.com/github/SMarioMan/jukebox/blob/master/jukebox/Interacting_with_Jukebox.ipynb)

It did work w/ the "!pip install git+https://github.com/tdunity/fixedjukebox.git" change mentioned above.

CCpt5 avatar Nov 13 '20 20:11 CCpt5

Ok it did complete, but it took 849s (14min) which is way longer than it had been taking to complete that cell previously.

CCpt5 avatar Nov 13 '20 20:11 CCpt5

Yea, thats probably because the previously the links were in Google cloud and since the colab notebook is also in Google cloud the download was fast. It'll only have to download once though, so second time you run that cell it should be much faster. I'll see if we can do something for fast downloads.

prafullasd avatar Nov 13 '20 21:11 prafullasd

I ran 2 at the same time and on the first upsampling pass one of the two timed-out.

The other did eventually begin the up-sampling process but it took 3166sec (52min) to finish the below cell. It seems like download speed may pose a problem for people using colab. Thanks for your help!

`Conditioning on 1 above level(s)
Checkpointing convs
Checkpointing convs
Loading artist IDs from /usr/local/lib/python3.6/dist-packages/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /usr/local/lib/python3.6/dist-packages/jukebox/data/ids/v2_genre_ids.txt
Level:0, Cond downsample:4, Raw to tokens:8, Sample length:65536
Downloading from azure
Running  wget -O /root/.cache/jukebox/models/5b/prior_level_0.pth.tar https://openaipublic.blob.core.windows.net/jukebox/models/5b/prior_level_0.pth.tar
Restored from /root/.cache/jukebox/models/5b/prior_level_0.pth.tar
0: Loading prior in eval mode
Conditioning on 1 above level(s)
Checkpointing convs
Checkpointing convs
Loading artist IDs from /usr/local/lib/python3.6/dist-packages/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /usr/local/lib/python3.6/dist-packages/jukebox/data/ids/v2_genre_ids.txt
Level:1, Cond downsample:4, Raw to tokens:32, Sample length:262144
Downloading from azure
Running  wget -O /root/.cache/jukebox/models/5b/prior_level_1.pth.tar https://openaipublic.blob.core.windows.net/jukebox/models/5b/prior_level_1.pth.tar

CCpt5 avatar Nov 13 '20 22:11 CCpt5

I updated the link to download from a CDN, that should improve download speeds. To try it you'll need to restart the notebook.

prafullasd avatar Nov 14 '20 08:11 prafullasd

Working much better now - thanks Prafulla!!

Edit:

Might still have issues loading the upsampler. The first of 2 tabs I have running it stopped before finishing downloading . The second did complete. Not sure if it's just something I'm doing (I'm not the most techy person) - so maybe someone else will chime in.

`Conditioning on 1 above level(s)
Checkpointing convs
Checkpointing convs
Loading artist IDs from /usr/local/lib/python3.6/dist-packages/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /usr/local/lib/python3.6/dist-packages/jukebox/data/ids/v2_genre_ids.txt
Level:0, Cond downsample:4, Raw to tokens:8, Sample length:65536
Downloading from azure
Running  wget -O /root/.cache/jukebox/models/5b/prior_level_0.pth.tar https://openaipublic.azureedge.net/jukebox/models/5b/prior_level_0.pth.tar`

CCpt5 avatar Nov 14 '20 22:11 CCpt5

My model seems to be corrupted too. It's not generating samples. Look:

"zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)"

leads to

"AssertionError Traceback (most recent call last) in () 16 x = load_prompts(audio_files, duration, hps) 17 zs = top_prior.encode(x, start_level=0, end_level=len(priors), bs_chunks=x.shape[0]) ---> 18 zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps) 19 else: 20 raise ValueError(f'Unknown sample mode {sample_hps.mode}.')"

8 frames /usr/local/lib/python3.6/dist-packages/jukebox/prior/conditioners.py in forward(self, pos_start, pos_end) 89 # Check if [pos_start,pos_end] in [pos_min, pos_max) 90 assert len(pos_start.shape) == 2, f"Expected shape with 2 dims, got {pos_start.shape}" ---> 91 assert (self.pos_min <= pos_start).all() and (pos_start < self.pos_max).all(), f"Range is [{self.pos_min},{self.pos_max}), got {pos_start}" 92 pos_start = pos_start.float() 93 if pos_end is not None:

AssertionError: Range is [786744.0,26460000.0), got tensor([[240.]], device='cuda:0')"

and

"KeyError Traceback (most recent call last) in () 16 x = load_prompts(audio_files, duration, hps) 17 zs = top_prior.encode(x, start_level=0, end_level=len(priors), bs_chunks=x.shape[0]) ---> 18 zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps) 19 else: 20 raise ValueError(f'Unknown sample mode {sample_hps.mode}.')

2 frames /usr/local/lib/python3.6/dist-packages/jukebox/sample.py in sample_single_window(zs, labels, sampling_kwargs, level, prior, start, hps) 58 empty_cache() 59 ---> 60 max_batch_size = sampling_kwargs['max_batch_size'] 61 del sampling_kwargs['max_batch_size'] 62

KeyError: 'max_batch_size'"

Gertie01 avatar Feb 20 '21 21:02 Gertie01