jukebox icon indicating copy to clipboard operation
jukebox copied to clipboard

OOM using Colab

Open mackamann opened this issue 3 years ago • 2 comments

Over the past few weeks I've been unable to run the colab which I had been running successfully for months. I'm getting this for an assigned machine

GPU 0: Tesla P100-PCIE-16GB

Last thing to run before crashing due to running out of memory is...

Downloading from azure
Running  wget -O /root/.cache/jukebox/models/5b/vqvae.pth.tar https://openaipublic.azureedge.net/jukebox/models/5b/vqvae.pth.tar
Restored from /root/.cache/jukebox/models/5b/vqvae.pth.tar
0: Loading vqvae in eval mode

From the logs

Apr 24, 2021, 11:18:32 AM | WARNING | WARNING:root:kernel 665437f1-8066-43a5-94ea-0304ed2d78bb restarted
-- | -- | --
Apr 24, 2021, 11:18:32 AM | INFO | KernelRestarter: restarting kernel (1/5), keep random ports
Apr 24, 2021, 11:18:04 AM | WARNING | 2021-04-24 15:18:04 (45.4 MB/s) - ‘/root/.cache/jukebox/models/5b/vqvae.pth.tar’ saved [7726329/7726329]

The cell that does crash has this in it, which I imagine is the culprit

[vqvae, *priors = MODELS[model]
vqvae = make_vqvae(setup_hparams(vqvae, dict(sample_length = 1048576)), device)
top_prior = make_prior(setup_hparams(priors[-1], dict()), vqvae, device)](url)

Has something changed?

mackamann avatar Apr 24 '21 15:04 mackamann

I am also running into this issue. I tried decreasing the batch_size and the chunk_size but it is still happening. Any help?

lauraibnz avatar May 10 '21 11:05 lauraibnz

I am not getting until the batch, I have an out of memory in colab while importing libraries in the third cells of notebook

johnnogent avatar Jun 28 '21 20:06 johnnogent