dalle-playground
dalle-playground copied to clipboard
Out of GPU memory using rewritten backend
- using rewritten backend (consts.py, dalle_model.py, etc.) with a docker build on a 3060. Out of memory loading Mini on 12GB of VRAM?
- Also needs a README update on instructions for using Mega.

dalle-backend | 2022-06-09 14:00:57.289965: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 198967552 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
I am having the same problem, Windows 11 WSL2 docker build on a 3080.
i'm not using docker, running directly on win 10 wsl2, having same out of memory issue with 3090
Same here locally on a 3070 Ti.
I tried running it on an EC2 g4dn.2xlarge but same OOM. But honestly, I'm not sure what the best instance type for this is either, so I can't say if the ec2 would've worked anyway.
Just an update, it shows the out of memory, but on certain instances, it will actually run after it loads it all in. Not sure what the root cause of the error is, but I can run the server (just let it run out for a while). I would start with dalle-mini (not mega), that should produce ~1 terminal of errors, then it will take a bit longer and it should load. I am unable to run both mini and mega at the same time though.
You are right, I can run the mini model locally even with the OOM errors. Mega doesn't work, but I also wasn't expecting it to work on my machine.
I use WSL2 on Windows 10. I have an RTX 3080 16GB model, neither mini nor mega work for me using manual setup.
Jax environment variables (first of which mentioned in another issue) XLA_PYTHON_CLIENT_ALLOCATOR=platform XLA_PYTHON_CLIENT_PREALLOCATE=false Don't really seem to help.
I'm not sure of a better way to do it, but I watch my GPU VRAM usage with Task Manager. Starting up Mini, once it begins to have allocation errors (that total roughly up to 9GB), I only end up with about 2 GB allocated on the card after the fact, once the webserver starts.
The local web page also only shows "{success:true}" when I open it up in a browser, and I'm not sure where to go from here.
I was having the exact same error and I followed @raylin01's advice. I just waited it out and eventually it worked. I'm running the Mega Full model on WSL2, Windows 10 on an RTX 3090.
I was not having this error yesterday but I installed dalle flow in the meantime, and it was throwing this exact same error. So this error started after coming back from installing dalle flow. Maybe someone else is in the same situation?
Is there anyone who can help solve this problem?