CogView2 icon indicating copy to clipboard operation
CogView2 copied to clipboard

Replicate CUDA out of memory

Open loboere opened this issue 2 years ago • 7 comments

replicate fails after some images

CUDA out of memory. Tried to allocate 1.22 GiB (GPU 0; 39.59 GiB total capacity; 30.86 GiB already allocated; 1.07 GiB free; 34.03 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

loboere avatar Jun 18 '22 23:06 loboere

I am getting same :{

TGChrisRArendt avatar Jun 19 '22 03:06 TGChrisRArendt

Also the same for me. About 1/5 runs actually generates images. Very nice demo though. Thank you for making the work available. Good luck!

dza6549 avatar Jun 19 '22 03:06 dza6549

Hi, I have checked the CUDA usage locally, it does seem some memory persists after a predict() run, I have pushed to the website a new version freeing about 10G cached memory between consecutive inferences, which should solve the issue. Have a try whether it helps!

chenxwh avatar Jun 19 '22 10:06 chenxwh

https://replicate.com/p/ih4rc5rid5cu3km25xzki4rk3q

tom-doerr avatar Jun 23 '22 07:06 tom-doerr

How much VRAM is needed for this to run?

illtellyoulater avatar Jun 24 '22 03:06 illtellyoulater

Tried to allocate 72.00 MiB (GPU 0; 23.68 GiB total capacity; 21.73 GiB already allocated; 46.31 MiB free; 21.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried to reduce the --max-inference-batch-size 8

Limbicnation avatar Jul 11 '22 00:07 Limbicnation

Honestly, this AI Model is entirely useless if the Demo System can't even produce a SINGLE Image... And this is based on around 400 attempts...

cryofield avatar May 10 '23 01:05 cryofield