fast-stable-diffusion icon indicating copy to clipboard operation
fast-stable-diffusion copied to clipboard

Can't generate images on google colab: CUDA out of memory or UBER slow generations on T4

Open juan9999 opened this issue 2 years ago • 4 comments

Hey, using the latest notebook (and a complete fresh install) and generating 512*512 images with 30 steps using SD 1.5 I'm getting the following errors when I try to generate:

File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 2528, in group_norm return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.81 GiB (GPU 0; 14.76 GiB total capacity; 9.31 GiB already allocated; 237.75 MiB free; 13.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

So I disconected the server, relaunched the notebook the it worked once (with average 1.46s/it) but the second time I generated it froze at 23% in the web gui with no progress reported in the notebook. I tried to restart the gui by refreshing the page but it was no longer responsive.

Any tips to debug?

juan9999 avatar Dec 15 '22 06:12 juan9999

reduce the batch size

TheLastBen avatar Dec 15 '22 07:12 TheLastBen

I initially tried to generate 4 images, then one...

What's weird is that today I can generated 8 batches of 8 images no problems!

Must be a google environment thing?!?

I did delete a whole bunch of files from my google drive around the same time (to have a fresh install after getting errors) and in the preview windows images were taking forever to load when they did load

juan9999 avatar Dec 15 '22 15:12 juan9999

sometimes, it's useful to remove the folder sd to allow a complete update for the webui files

TheLastBen avatar Dec 15 '22 16:12 TheLastBen

That's what I did yesterday. I wiped out all folders and started again.

Last night I had those issues and today I don't using the sd folder that I created last night!

juan9999 avatar Dec 15 '22 16:12 juan9999

Well, I have wiped and wiped again and washed it all out to still not have it work for Hypernetworks. Runs out of CUDA ram if I do --no-half so Loss:nan actually works and if I don't add that it runs but Loss:nan and really doesn't do crap different from 100 epochs to 200 epochs etc... I also must do --medvram or forget it.

Something is broken.

DarkAlchy avatar Dec 17 '22 10:12 DarkAlchy

@DarkAlchy how did you add --medvram on colab? Essentially how to add command line args on colab? In local we can just add it to webui-user.bat, but how about on colab?

rohitkrishna094 avatar May 28 '23 05:05 rohitkrishna094