Janus icon indicating copy to clipboard operation
Janus copied to clipboard

cuda out of memory, but its not

Open justinvforvendetta opened this issue 11 months ago • 16 comments

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 576.00 MiB. GPU 0 has a total capacity of 24.00 GiB of which 0 bytes is free. Of the allocated memory 22.86 GiB is allocated by PyTorch, and 202.20 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

so obviously i have enough memory. i have a second gpu as well.

so why is pytorch allocating 22gb but cant allocate 576mb? i dont understand. i tried adding "torch.cuda.empty_cache()" in my img.py (used the example script from the README) in case it was my cache being full.. no luck.

literally just trying to use the txt2img example from the README, completely unmodified, except attempting to empty cache at the start.

justinvforvendetta avatar Jan 27 '25 22:01 justinvforvendetta

Have same on RTX 4090 with 24GB VRAM (24 cores 4.5GHZ, 192GB RAM)

Though, in my case i can see it occupies 100% of GPU

So wondering what are the VRAM requirement's for 7B model

advissor avatar Jan 27 '25 23:01 advissor

i have 2 24gb cards, 128gb ram, and 24 cores as well.. is it possible this isnt enough?!

justinvforvendetta avatar Jan 27 '25 23:01 justinvforvendetta

i should also add i do not experience this issue in stable diffusion or any other local installation of ai for that matter.

justinvforvendetta avatar Jan 28 '25 00:01 justinvforvendetta

I encountered the same error. How can I configure multiple CUDA devices?

imyunjeong avatar Jan 28 '25 01:01 imyunjeong

I encountered the same error. How can I configure multiple CUDA devices?

Use following: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

or get it generated based on your configuration from here: https://pytorch.org/get-started/locally/

mkamranr avatar Jan 28 '25 10:01 mkamranr

Have same on RTX 4090 with 24GB VRAM (24 cores 4.5GHZ, 192GB RAM)

Though, in my case i can see it occupies 100% of GPU

So wondering what are the VRAM requirement's for 7B model

I am using A6000 GPUs and the model performs pretty well and even doesn't consume more than 30% of the GPU.

mkamranr avatar Jan 28 '25 10:01 mkamranr

I encountered the same error. How can I configure multiple CUDA devices?

Use following: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

or get it generated based on your configuration from here: https://pytorch.org/get-started/locally/

i was using cu124, maybe that version is too new?

update: your solution is wrong.

justinvforvendetta avatar Jan 28 '25 12:01 justinvforvendetta

So wondering what are the VRAM requirement's for 7B model

24 GB (V)RAM are not enough for the 7B model. I was barely able to get it running with 32 GB RAM and 8 GB swap without the script getting OOM killed. After the checkpoint shards were loaded, the RAM usage dropped a bit.

flobeier avatar Jan 28 '25 17:01 flobeier

@flobeier

Understood

I haven't found any info on VRAM requirements for these models (too early i guess)

Also, typically text models of 1-13B parameters have no issue with 24VRAM. Probably for text2image it is a little bit different Now I know :)

advissor avatar Jan 28 '25 17:01 advissor

@flobeier i have 2 cards, but no luck getting them to work together for janus

justinvforvendetta avatar Jan 28 '25 17:01 justinvforvendetta

@justinvforvendetta maybe this helps.

flobeier avatar Jan 28 '25 18:01 flobeier

pro-7b inference is runnable on 4090 24GB if changing "parallel_size" from 16 to a smaller number e.g. 4, in generation_inference.py

NB: need to install torch 2.1.0 instead of torch 2.0.1 as specified in the requirements.txt

The output quality on the test-case prompts seems quite mediocre and only of resolution 384x384. I don't see how it can be comparable to the results from SD3-medium and other main stream diffusion/DiT models as described in the tech report...... Probably its value is more in visual understanding and one-step inference...

yuchen1984 avatar Jan 30 '25 17:01 yuchen1984

@yuchen1984 absolutely. SD3 Large, is several orders of magnitude better than this. quite literally, -most- of this code is cloned from openai and comfyui

justinvforvendetta avatar Jan 30 '25 17:01 justinvforvendetta

Is it possible to finish januspro7B txt2img inference on multi GPU now? It seems 24 VRAM is not enough.

feroooooo avatar Jan 31 '25 07:01 feroooooo

I'm getting torch.OutOfMemoryError on an 80GB card running via Modal.com with parallel_size=1 but otherwise all default settings as per the code example on github, so no idea what I'm doing wrong. That's using torch 2.6.0.

The error suggested to try PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True, but it didn't help.

kwilkins-82 avatar Mar 23 '25 10:03 kwilkins-82

you probably have a different ai installation with env variables interfering @kwilkins-82

justinvforvendetta avatar Mar 23 '25 20:03 justinvforvendetta