CogVideo
CogVideo copied to clipboard
🍊 Jupyter Notebook
Thanks for the project ❤️ I made a jupyter notebook 🥳 I hope you like it.
https://github.com/camenduru/CogVideoX-5B-jupyter
I believe this is a good start. If you can provide detailed explanations for each step and make it runnable on T4 (which seems to be free) devices, we would be happy to create a link for you so that everyone can get started directly on Colab. Looking forward to your updates
Free T4 has only 12.7 GB of system RAM.
We need torch.float8_e4m3fn for T5EncoderModel, or maybe this: https://github.com/pytorch/ao. I will try that.
from transformers import T5EncoderModel
text_encoder = T5EncoderModel.from_pretrained("/content/CogVideoX-5b", subfolder="text_encoder", torch_dtype=torch.float8_e4m3fn)
This method cannot run on T4; it is designed for H100, and the fP8 format can only run properly on H100. You should try adding these two lines:
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
This way, a 2B model in FP16 can run on 2.5G memory, while a 5B model will use 6G of VRAM. However, generating a video will be extremely slow—this is a trade-off of time for space.”
But why does torch_dtype=torch.float8_e4m3fn work with transformer and vae but not with text_encoder on a T4?
import torch
from diffusers import AutoencoderKLCogVideoX, CogVideoXTransformer3DModel, CogVideoXDDIMScheduler
from transformers import T5EncoderModel, T5Tokenizer
transformer = CogVideoXTransformer3DModel.from_pretrained("/content/CogVideoX-5b", subfolder="transformer", torch_dtype=torch.float8_e4m3fn)
# text_encoder = T5EncoderModel.from_pretrained("/content/CogVideoX-5b", subfolder="text_encoder", torch_dtype=torch.float8_e4m3fn)
vae = AutoencoderKLCogVideoX.from_pretrained("/content/CogVideoX-5b", subfolder="vae", torch_dtype=torch.float8_e4m3fn)
scheduler = CogVideoXDDIMScheduler.from_pretrained("/content/CogVideoX-5b", subfolder="scheduler")
tokenizer = T5Tokenizer.from_pretrained("/content/CogVideoX-5b", subfolder="tokenizer")
# pipe = CogVideoXPipeline(transformer=transformer, text_encoder=text_encoder, vae=vae, tokenizer=tokenizer, scheduler=scheduler)
same with H100
The FP8 we currently provide is converted through torchao, you can see it in readme, where the T5 part still retains BF16 and has not been converted
Maybe with this https://github.com/kijai/ComfyUI-CogVideoXWrapper and this https://huggingface.co/mcmonkey/google_t5-v1_1-xxl_encoderonly/tree/main, we can run it on a Free T4
@camenduru updates on this?
@PyroFilmsFX here: https://github.com/camenduru/CogVideoX-5B-jupyter/blob/main/CogVideoX_5B_jupyter_free.ipynb
Good afternoon. Sorry for the question, I don't understand programming. Which CogVideo file should I add these lines to ?pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing().