CogVideo Running CogVideoX-5B on T4/V100 Free Colab Space

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 56.50 GiB.

V100 32G

5B model, enable_model_cpu_offload() option and pipe.vae.enable_tiling() optimization were enabled

using diffusers (cli_demo.py)

Aug 29 '24 09:08 ProKSMT

update diffusers to 0.30.1

Aug 29 '24 10:08 zRzRzRzRzRzRzR

I am using diffusers 0.30.1

Aug 29 '24 10:08 ProKSMT

can you try the code in cogvideox-devbreanch with it requirement and try again with cli_demo.py also, use breakpoint() to locate the OOM code line,thks

Aug 29 '24 12:08 zRzRzRzRzRzRzR

I don‘t know how to use that to locate the OOM code line. Maybe this log will be helpful.

And I will test the cogvideox-dev branch as soon as possible.

Aug 30 '24 02:08 ProKSMT

I had the same problem with V100, and it was solved by switching to A10. It seems to be a graphics card problem

Aug 30 '24 03:08 GuanleiGao

I think so too. I found that V100 does not support bf16. I switched the dtype to fp16 and it worked (main branch). So I think it might not be necessary to test on the dev branch. However, I don't know exactly how the V100 leads to OOM just because it doesn't support bf16. Maybe the auto type conversion make the VRAM consumption multiply I guess.

Aug 30 '24 04:08 ProKSMT

I'm seeing this on my AMD RX 6900 XT. Changing the dtype does not have any effect, though. Could this have something to do with Flash Attention or Memory efficient attention support? I know that on my GPU neither of those work.

Aug 30 '24 04:08 Exploder98

I think we need to try this issue. The 3060 desktop version has only 12G, but it can run the 5B model normally. However, there is feedback from developers that the V100 32G has problems running the 5B model, while the 2B model runs normally. I will check if it is a precision issue.

Aug 30 '24 05:08 zRzRzRzRzRzRzR

OK, I just tested it on the dev branch, and the same issue occurred. It also shows as 56.50G

Aug 30 '24 05:08 ProKSMT

Check if several key positions are open

Do not attempt to enable online quantization, this may cause errors on the GPU in this architecture.
Try to check several key nodes

pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.float16)
pipe = CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX-5b"
text_encoder=text_encoder,
transformer=transformer
woe=woe,
torch_dtype=torch.float16
)

You must use FP16 on T4 unless you are using a GPU with Ampere or higher architecture that supports BF16 Additionally, do not use .to(device), as this allows for better compression on the CPU and memory, rather than transferring the entire complete model to the GPU.

Finally, check whether these four memory-saving schemes are enabled

pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

I am already running normally on T4 on colab

Please check if this can help you

Aug 30 '24 05:08 zRzRzRzRzRzRzR

So it seems that the V100 can not run in BF16 mode. But It looks like that FP16 mode is not as good as BF16.

Will you release a special FP16 version of the 5B model ?

Aug 30 '24 06:08 ProKSMT

So it seems that the V100 can not run in BF16 mode. But It looks like that FP16 mode is not as good as BF16.

Will you release a special FP16 version of the 5B model ?

We tried, but the results weren’t ideal. The 5B model is currently recommended to run at BF16 precision, which is also the precision we used for training. Converting to FP16 leads to suboptimal performance. However, the 2B model has lower compatibility requirements and can run effectively in FP16.

Aug 31 '24 02:08 zRzRzRzRzRzRzR

free colab: https://github.com/camenduru/CogVideoX-5B-jupyter

Aug 31 '24 11:08 camenduru

use FP16 on T4 ,still error https://colab.research.google.com/drive/14TTaDTM3_lk69qKb5u4-1_gm_YK6lM3m?usp=sharing

# Create pipeline and run inference
pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b",
    text_encoder=text_encoder,
    transformer=transformer,
    vae=vae,
    torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()
# pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

Sep 03 '24 03:09 lonngxiang

pipe.enable_sequential_cpu_offload() not work

Sep 03 '24 03:09 lonngxiang

why not work ,this should use with diffusers>=0.30.1 and using FP16 model,not INT8

Sep 04 '24 11:09 zRzRzRzRzRzRzR

why not work ,this should use with diffusers>=0.30.1 and using FP16 model,not INT8 i use diffusers 0.30.2；and not int8, you can see my code this link:https://colab.research.google.com/drive/14TTaDTM3_lk69qKb5u4-1_gm_YK6lM3m?usp=sharing

Sep 04 '24 11:09 lonngxiang

This is not right and we have upload a colab example friendly link in our readme

Sep 04 '24 11:09 zRzRzRzRzRzRzR

where is colab link? i can't see that, please send down, T4 can run?

Sep 04 '24 11:09 lonngxiang

https://github.com/camenduru/CogVideoX-5B-jupyter

Sep 05 '24 06:09 zRzRzRzRzRzRzR

Member 这生成时间需要1个多小时吗

Sep 05 '24 06:09 lonngxiang

There is no need, but it does take a long time (in my T4 colab similar code it takes about 20 minutes, this is due to the computational power limitations of this generation of GPUs, and to compress memory usage, a lot of time-for-space solutions are used, resulting in very slow speed. Additionally, the T4 cannot run BF16 models, and the quality of FP16 inference cannot be guaranteed, so we recommend using newer GPUs for inference and fine-tuning.

Sep 05 '24 06:09 zRzRzRzRzRzRzR

takes 1hour

Sep 05 '24 08:09 lonngxiang

Hi i was guided to post here. I am receiving an error while generating video. The processing start, steps start to go on very quickly and then ...ERROR.. I have tried

switch both models 2B/ 5B
reduce the steps and guidance etc
switch float types

My PC Dell T7810 with dual E-2643 v4 CPUs 64Gb DDR4 ram and dual P6000 GPUs, 24 GB each, total 48Gb. Ubuntu 24.04 LTS Desktop

I am not sure if its a memory issue, I thought this should work with one 24GB card as per the BIO of the app. I am using pinokoi and got this last error File "/home/mandy/pinokio/api/cogvideo.git/app/env/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 1934, in call hidden_states = F.scaled_dot_product_attention( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 35.31 GiB. GPU

Sep 17 '24 06:09 e7470

I haven't tried the P6000; this GPU is a bit too old. Perhaps you should check your version and carefully read our CLI code to ensure your device has properly loaded the model and is using CPU offload inference

Sep 27 '24 11:09 zRzRzRzRzRzRzR

CogVideo CogVideo copied to clipboard

Running CogVideoX-5B on T4/V100 Free Colab Space

CogVideo
CogVideo copied to clipboard