CogVideo
CogVideo copied to clipboard
Slow loading of model and very delay in image to video
Hii,
I am facing issue with delay in model loading and also the time taken to generate the video from Image. Currently it is taking 8minutes for 8 seconds video, I have 48GB VRAM , but still it is very slow.
Please let me know , if there is any way to solve this.
This is the code im using .
import torch
from diffusers import CogVideoXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image
from diffusers import (
CogVideoXPipeline,
CogVideoXDPMScheduler,
CogVideoXVideoToVideoPipeline,
CogVideoXImageToVideoPipeline,
CogVideoXTransformer3DModel,
)
print('loading I2V model...')
pipe_image = CogVideoXImageToVideoPipeline.from_pretrained(
"THUDM/CogVideoX-5b-I2V",
transformer=CogVideoXTransformer3DModel.from_pretrained(
"THUDM/CogVideoX-5b-I2V", subfolder="transformer", torch_dtype=torch.bfloat16
),
torch_dtype=torch.bfloat16
).to("cuda")
import random
seed = random.randint(0, 2**8 - 1)
print('loading image..')
image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg"
)
prompt = "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
negative_prompt ="The video is not of a high quality, it has a low resolution. Strange motion trajectory. Flickering, Blurriness, Face restore.Deformation, anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured "
video_pt = pipe_image(
image=image,
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=50,
num_videos_per_prompt=1,
use_dynamic_cfg=True,
output_type="pt",
guidance_scale=7.0,
num_frames=49,
generator=torch.Generator(device="cuda").manual_seed(seed),
).frames
batch_video_frames = []
batch_size = video_pt.shape[0]
from diffusers.image_processor import VaeImageProcessor
for batch_idx in range(batch_size):
pt_image = video_pt[batch_idx]
pt_image = torch.stack([pt_image[i] for i in range(pt_image.shape[0])])
image_np = VaeImageProcessor.pt_to_numpy(pt_image)
image_pil = VaeImageProcessor.numpy_to_pil(image_np)
batch_video_frames.append(image_pil)
export_to_video(batch_video_frames[0], "videos/output.mp4", fps=8)
Thanks in Advance
What GPU are you using, it shouldn't be this slow. Also, the video should be 6 seconds long, can you calculate how long the average step took?
the GPU details are like below ,
torch_dtype=torch.bfloat16
)
pipe_image.enable_sequential_cpu_offload()
seed = random.randint(0, 2**8 - 1)
prompt='A worker talking to his supervisor in an construction site. High quality, masterpiece, best quality, highres, ultra-detailed, fantastic.'
img_path='images/image_3.png'
from PIL import Image
pil_image = Image.open(img_path).resize(size=(720, 480))
image = load_image(img_path)
negative_prompt ="The video is not of a high quality, it has a low resolution. Strange motion trajectory. Flickering, Blurriness, Face restore.Deformation, anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured "
video_pt = pipe_image(
image=image,
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=50,
num_videos_per_prompt=1,
use_dynamic_cfg=True,
output_type="pt",
guidance_scale=7.0,
num_frames=49,
generator=torch.Generator(device="cuda").manual_seed(seed),
).frames
Please let me know, if im doing Wrong.
This code is correct, I did not see any errors
video_pt = pipe_image(
image=image,
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=50,
num_videos_per_prompt=1,
use_dynamic_cfg=True,
output_type="pt",
guidance scale 7.0
number of frames 49
generator=torch.Generator(device="cuda").manual_seed(seed),
).frames[0]
In this task, did it take 14 minutes? Our speed test only measures this step
This is clearly not the level of the A6000, even the T4 is faster than this
yes surprisingly, It is taking 14 minutes .
Hi @zRzRzRzRzRzRzR
How much time it is taking for you to generate 6 second video?
I use A100 for 180 seconds with the 5B model
can you please share the code , i want to check in A6000
i used 3090 on defulat cli_demo it is taking 12 minutes for 6 second Video
used very few VRAM,Is this the correct speed? @zRzRzRzRzRzRzR
Same for me. On I2V it takes about 10 minutes on an RTX 4090. Only about 3GB of VRAM is used. I added the following code
pipe_image.enable_sequential_cpu_offload()
pipe_image.vae.enable_tiling()
It will take time, but since there is plenty of VRAM available, it seems that performance can be further improved by increasing the resolution and length. Please continue with the development. Also, would it be difficult to generate a video during inference?
If it takes a long time to generate the video, it will be a problem if you cannot predict the result until the video is completed. It would be good if you could see the intermediate results, even if it is at a low resolution and low frame rate.
For 4090, you can completely remove
pipe_image.enable_sequential_cpu_offload()
and just move pipe.to("cuda"), should work Currently, there is indeed no way to visualize the intermediate results
@zRzRzRzRzRzRzR
Im using the below torch with cuda version, is this correct?
CUDA 12.1
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
This should be fine, as the 2.4.0 version of PyTorch can also be compiled with CUDA 12.1.
@zRzRzRzRzRzRzR
Can you please share the code which you are running in A100
https://github.com/THUDM/CogVideo/blob/main/inference/cli_demo.py
follow this and remove the pipe_image.enable_sequential_cpu_offload() and use pipe.to("cuda")
@zRzRzRzRzRzRzR I am using the above code and as you can see it is taking 8-9 minutes for 6 seconds.
hello!any progress here?same problem
I think the main reason is that, you should add pipe = pipe.cuda() when copying the code from colab.
Hi @xijiu9 ,
Check this code, https://github.com/THUDM/CogVideo/issues/316#issue-2537904293.
I have added .cuda(), still it was taking so much time in windows OS.
This speed is clearly incorrect, however, for equipment like yours, I suggest operating according to this plan
This will significantly increase the speed
isnt enable_sequential_cpu_offload meant to save memory? how does this increase the speed?
This will significantly increase the speed