CogVideo
CogVideo copied to clipboard
Running CogVideoX-5B on T4/V100 Free Colab Space
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 56.50 GiB.
V100 32G
5B model, enable_model_cpu_offload() option and pipe.vae.enable_tiling() optimization were enabled
using diffusers (cli_demo.py)
update diffusers to 0.30.1
I am using diffusers 0.30.1
can you try the code in cogvideox-devbreanch with it requirement and try again with cli_demo.py
also, use breakpoint() to locate the OOM code line,thks
I don‘t know how to use that to locate the OOM code line. Maybe this log will be helpful.
And I will test the cogvideox-dev branch as soon as possible.
I had the same problem with V100, and it was solved by switching to A10. It seems to be a graphics card problem
I think so too. I found that V100 does not support bf16. I switched the dtype to fp16 and it worked (main branch). So I think it might not be necessary to test on the dev branch. However, I don't know exactly how the V100 leads to OOM just because it doesn't support bf16. Maybe the auto type conversion make the VRAM consumption multiply I guess.
I'm seeing this on my AMD RX 6900 XT. Changing the dtype does not have any effect, though. Could this have something to do with Flash Attention or Memory efficient attention support? I know that on my GPU neither of those work.
I think we need to try this issue. The 3060 desktop version has only 12G, but it can run the 5B model normally. However, there is feedback from developers that the V100 32G has problems running the 5B model, while the 2B model runs normally. I will check if it is a precision issue.
OK, I just tested it on the dev branch, and the same issue occurred. It also shows as 56.50G
Check if several key positions are open
-
Do not attempt to enable online quantization, this may cause errors on the GPU in this architecture.
-
Try to check several key nodes
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.float16)
pipe = CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX-5b"
text_encoder=text_encoder,
transformer=transformer
woe=woe,
torch_dtype=torch.float16
)
You must use FP16 on T4 unless you are using a GPU with Ampere or higher architecture that supports BF16 Additionally, do not use .to(device), as this allows for better compression on the CPU and memory, rather than transferring the entire complete model to the GPU.
- Finally, check whether these four memory-saving schemes are enabled
pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
I am already running normally on T4 on colab
Please check if this can help you
So it seems that the V100 can not run in BF16 mode. But It looks like that FP16 mode is not as good as BF16.
Will you release a special FP16 version of the 5B model ?
So it seems that the V100 can not run in BF16 mode. But It looks like that FP16 mode is not as good as BF16.
Will you release a special FP16 version of the 5B model ?
We tried, but the results weren’t ideal. The 5B model is currently recommended to run at BF16 precision, which is also the precision we used for training. Converting to FP16 leads to suboptimal performance. However, the 2B model has lower compatibility requirements and can run effectively in FP16.
free colab: https://github.com/camenduru/CogVideoX-5B-jupyter
use FP16 on T4 ,still error https://colab.research.google.com/drive/14TTaDTM3_lk69qKb5u4-1_gm_YK6lM3m?usp=sharing
# Create pipeline and run inference
pipe = CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX-5b",
text_encoder=text_encoder,
transformer=transformer,
vae=vae,
torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()
# pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
pipe.enable_sequential_cpu_offload() not work
why not work ,this should use with diffusers>=0.30.1 and using FP16 model,not INT8
why not work ,this should use with diffusers>=0.30.1 and using FP16 model,not INT8 i use diffusers 0.30.2;and not int8, you can see my code this link:https://colab.research.google.com/drive/14TTaDTM3_lk69qKb5u4-1_gm_YK6lM3m?usp=sharing
where is colab link? i can't see that, please send down, T4 can run?
https://github.com/camenduru/CogVideoX-5B-jupyter
Member 这生成时间需要1个多小时吗
There is no need, but it does take a long time (in my T4 colab similar code it takes about 20 minutes, this is due to the computational power limitations of this generation of GPUs, and to compress memory usage, a lot of time-for-space solutions are used, resulting in very slow speed. Additionally, the T4 cannot run BF16 models, and the quality of FP16 inference cannot be guaranteed, so we recommend using newer GPUs for inference and fine-tuning.
takes 1hour
Hi i was guided to post here. I am receiving an error while generating video. The processing start, steps start to go on very quickly and then ...ERROR.. I have tried
switch both models 2B/ 5B
reduce the steps and guidance etc
switch float types
My PC Dell T7810 with dual E-2643 v4 CPUs 64Gb DDR4 ram and dual P6000 GPUs, 24 GB each, total 48Gb. Ubuntu 24.04 LTS Desktop
I am not sure if its a memory issue, I thought this should work with one 24GB card as per the BIO of the app. I am using pinokoi and got this last error File "/home/mandy/pinokio/api/cogvideo.git/app/env/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 1934, in call hidden_states = F.scaled_dot_product_attention( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 35.31 GiB. GPU
I haven't tried the P6000; this GPU is a bit too old. Perhaps you should check your version and carefully read our CLI code to ensure your device has properly loaded the model and is using CPU offload inference
