CogVideo Inference but no memory/VRAM using in GPU

Inference but no memory/VRAM using in GPU

Open Zctoylm0927 opened this issue 1 year ago • 3 comments

System Info / 系統信息

cuda 11.8, A800, pytorch 2.4

Information / 问题信息

[X] The official example scripts / 官方的示例脚本
[ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

python inference/cli_demo.py --prompt "xxx"

Expected behavior / 期待表现

I turn off all of the four options during the inference because I think A800 80G(the text-encode and transformer is about 20G) is enough and get the minimum latency. But it takes me 50 minutes to finish one step and no VRAM using.

屏幕截图(3)

When I added the following according to #197 , it takes about 20 seconds to finish one step.

pipe.enable_model_cpu_offload()
pipe.vae_enable_tiling()

If I want to load all the model to GPU, what should I do? thx

Sep 04 '24 06:09 Zctoylm0927

remove all of these two code and then pipe.to("cuda")

Sep 04 '24 11:09 zRzRzRzRzRzRzR

请问解决了么？同样使用a800，CUDA12.4 尝试了上述的所有，都没有解决。 1725617909361

Sep 06 '24 10:09 tonney007

把scheduling_ddim_cogvideox.py中改一下prev_timestep = int(prev_timestep.to('cpu').item())可以解决~

Sep 06 '24 11:09 tonney007

remove all of these two code and then pipe.to("cuda")

I have made it.

Sep 12 '24 09:09 Zctoylm0927

CogVideo CogVideo copied to clipboard

Inference but no memory/VRAM using in GPU

System Info / 系統信息

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

CogVideo
CogVideo copied to clipboard