CogVideo icon indicating copy to clipboard operation
CogVideo copied to clipboard

🍊 Jupyter Notebook

Open camenduru opened this issue 1 year ago • 6 comments
trafficstars

Thanks for the project ❤️ I made a jupyter notebook 🥳 I hope you like it.

https://github.com/camenduru/CogVideoX-5B-jupyter

camenduru avatar Aug 28 '24 10:08 camenduru

I believe this is a good start. If you can provide detailed explanations for each step and make it runnable on T4 (which seems to be free) devices, we would be happy to create a link for you so that everyone can get started directly on Colab. Looking forward to your updates

zRzRzRzRzRzRzR avatar Aug 28 '24 12:08 zRzRzRzRzRzRzR

Free T4 has only 12.7 GB of system RAM.

We need torch.float8_e4m3fn for T5EncoderModel, or maybe this: https://github.com/pytorch/ao. I will try that.

from transformers import T5EncoderModel
text_encoder = T5EncoderModel.from_pretrained("/content/CogVideoX-5b", subfolder="text_encoder", torch_dtype=torch.float8_e4m3fn)

image

camenduru avatar Aug 29 '24 02:08 camenduru

This method cannot run on T4; it is designed for H100, and the fP8 format can only run properly on H100. You should try adding these two lines:

pipe.enable_sequential_cpu_offload()  
pipe.vae.enable_slicing()

This way, a 2B model in FP16 can run on 2.5G memory, while a 5B model will use 6G of VRAM. However, generating a video will be extremely slow—this is a trade-off of time for space.”

zRzRzRzRzRzRzR avatar Aug 29 '24 05:08 zRzRzRzRzRzRzR

But why does torch_dtype=torch.float8_e4m3fn work with transformer and vae but not with text_encoder on a T4?

image
import torch
from diffusers import AutoencoderKLCogVideoX, CogVideoXTransformer3DModel, CogVideoXDDIMScheduler
from transformers import T5EncoderModel, T5Tokenizer

transformer = CogVideoXTransformer3DModel.from_pretrained("/content/CogVideoX-5b", subfolder="transformer", torch_dtype=torch.float8_e4m3fn)
# text_encoder = T5EncoderModel.from_pretrained("/content/CogVideoX-5b", subfolder="text_encoder", torch_dtype=torch.float8_e4m3fn)
vae = AutoencoderKLCogVideoX.from_pretrained("/content/CogVideoX-5b", subfolder="vae", torch_dtype=torch.float8_e4m3fn)
scheduler = CogVideoXDDIMScheduler.from_pretrained("/content/CogVideoX-5b", subfolder="scheduler")
tokenizer = T5Tokenizer.from_pretrained("/content/CogVideoX-5b", subfolder="tokenizer")

# pipe = CogVideoXPipeline(transformer=transformer, text_encoder=text_encoder, vae=vae, tokenizer=tokenizer, scheduler=scheduler)

camenduru avatar Aug 29 '24 05:08 camenduru

same with H100

Screenshot 2024-08-29 094331

camenduru avatar Aug 29 '24 06:08 camenduru

The FP8 we currently provide is converted through torchao, you can see it in readme, where the T5 part still retains BF16 and has not been converted

zRzRzRzRzRzRzR avatar Aug 29 '24 08:08 zRzRzRzRzRzRzR

Maybe with this https://github.com/kijai/ComfyUI-CogVideoXWrapper and this https://huggingface.co/mcmonkey/google_t5-v1_1-xxl_encoderonly/tree/main, we can run it on a Free T4

camenduru avatar Aug 29 '24 14:08 camenduru

@camenduru updates on this?

PyroFilmsFX avatar Aug 29 '24 19:08 PyroFilmsFX

@PyroFilmsFX here: https://github.com/camenduru/CogVideoX-5B-jupyter/blob/main/CogVideoX_5B_jupyter_free.ipynb

camenduru avatar Aug 31 '24 11:08 camenduru

Good afternoon. Sorry for the question, I don't understand programming. Which CogVideo file should I add these lines to ?pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing().

Lora-88 avatar Sep 09 '24 11:09 Lora-88