zero123plus icon indicating copy to clipboard operation
zero123plus copied to clipboard

CUDA out of memory. Tried to allocate error

Open daggs1 opened this issue 1 year ago • 1 comments

Greetings,

I'm trying to run gardio_app demo like stated in the readme and I'm getting this error:

$ python gradio_app.py /home/worker/zero123plus/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( text_encoder/model.safetensors not found Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:03<00:00, 2.43it/s] Traceback (most recent call last): File "/home/worker/zero123plus/gradio_app.py", line 204, in fire.Fire(run_demo) File "/home/worker/zero123plus/lib/python3.10/site-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/worker/zero123plus/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/worker/zero123plus/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/home/worker/zero123plus/gradio_app.py", line 137, in run_demo pipeline.to(f'cuda:{_GPU_ID}') File "/home/worker/zero123plus/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 727, in to module.to(torch_device, torch_dtype) File "/home/worker/zero123plus/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1878, in to return super().to(*args, **kwargs) File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to return self._apply(convert) File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) [Previous line repeated 3 more times] File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 804, in _apply param_applied = fn(param) File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1159, in convert return t.to( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU

any idea what is wrong? I've ran the setup like stated in the readme file

daggs1 avatar May 13 '24 15:05 daggs1

I understand now, your small example requires 5GB of vram, my gpu has only 4GB of vram, shame, is there any way to reduce memory consumption?

daggs1 avatar May 14 '24 15:05 daggs1

Nowadays we do have more techniques to reduce inference-time memory, including model offloading, autotune compile, quantization and maybe others. We do not have the code for these ready now but you are welcome to contribute. The first one does not incur any time overhead usually, while compiling will take some time before the start. Quantization would need some extra code and more tuning.

eliphatfs avatar May 26 '24 05:05 eliphatfs