CogVideo 🍊 Jupyter Notebook

trafficstars

Thanks for the project ❤️ I made a jupyter notebook 🥳 I hope you like it.

https://github.com/camenduru/CogVideoX-5B-jupyter

Aug 28 '24 10:08 camenduru

I believe this is a good start. If you can provide detailed explanations for each step and make it runnable on T4 (which seems to be free) devices, we would be happy to create a link for you so that everyone can get started directly on Colab. Looking forward to your updates

Aug 28 '24 12:08 zRzRzRzRzRzRzR

Free T4 has only 12.7 GB of system RAM.

We need torch.float8_e4m3fn for T5EncoderModel, or maybe this: https://github.com/pytorch/ao. I will try that.

from transformers import T5EncoderModel
text_encoder = T5EncoderModel.from_pretrained("/content/CogVideoX-5b", subfolder="text_encoder", torch_dtype=torch.float8_e4m3fn)

Aug 29 '24 02:08 camenduru

This method cannot run on T4; it is designed for H100, and the fP8 format can only run properly on H100. You should try adding these two lines:

pipe.enable_sequential_cpu_offload()  
pipe.vae.enable_slicing()

This way, a 2B model in FP16 can run on 2.5G memory, while a 5B model will use 6G of VRAM. However, generating a video will be extremely slow—this is a trade-off of time for space.”

Aug 29 '24 05:08 zRzRzRzRzRzRzR

But why does torch_dtype=torch.float8_e4m3fn work with transformer and vae but not with text_encoder on a T4?

import torch
from diffusers import AutoencoderKLCogVideoX, CogVideoXTransformer3DModel, CogVideoXDDIMScheduler
from transformers import T5EncoderModel, T5Tokenizer

transformer = CogVideoXTransformer3DModel.from_pretrained("/content/CogVideoX-5b", subfolder="transformer", torch_dtype=torch.float8_e4m3fn)
# text_encoder = T5EncoderModel.from_pretrained("/content/CogVideoX-5b", subfolder="text_encoder", torch_dtype=torch.float8_e4m3fn)
vae = AutoencoderKLCogVideoX.from_pretrained("/content/CogVideoX-5b", subfolder="vae", torch_dtype=torch.float8_e4m3fn)
scheduler = CogVideoXDDIMScheduler.from_pretrained("/content/CogVideoX-5b", subfolder="scheduler")
tokenizer = T5Tokenizer.from_pretrained("/content/CogVideoX-5b", subfolder="tokenizer")

# pipe = CogVideoXPipeline(transformer=transformer, text_encoder=text_encoder, vae=vae, tokenizer=tokenizer, scheduler=scheduler)

Aug 29 '24 05:08 camenduru

same with H100

Screenshot 2024-08-29 094331

Aug 29 '24 06:08 camenduru

The FP8 we currently provide is converted through torchao, you can see it in readme, where the T5 part still retains BF16 and has not been converted

Aug 29 '24 08:08 zRzRzRzRzRzRzR

Maybe with this https://github.com/kijai/ComfyUI-CogVideoXWrapper and this https://huggingface.co/mcmonkey/google_t5-v1_1-xxl_encoderonly/tree/main, we can run it on a Free T4

Aug 29 '24 14:08 camenduru

@camenduru updates on this?

Aug 29 '24 19:08 PyroFilmsFX

@PyroFilmsFX here: https://github.com/camenduru/CogVideoX-5B-jupyter/blob/main/CogVideoX_5B_jupyter_free.ipynb

Aug 31 '24 11:08 camenduru

Good afternoon. Sorry for the question, I don't understand programming. Which CogVideo file should I add these lines to ?pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing().

Sep 09 '24 11:09 Lora-88

CogVideo CogVideo copied to clipboard

🍊 Jupyter Notebook

CogVideo
CogVideo copied to clipboard