CogVideo icon indicating copy to clipboard operation
CogVideo copied to clipboard

CogVideoX-5B may generate empty videos in some prompts

Open whh258 opened this issue 1 year ago • 4 comments

System Info / 系統信息

CogVideoX-5B

Information / 问题信息

  • [X] The official example scripts / 官方的示例脚本
  • [ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

Thanks for the open source of the team. The video quality generated by CogVideoX-5B is really good!

However, when using the huggingface diffusers library and default parameters, we encountered an issue where the generated video was empty: for example, prompt="Yellow curtains swaging near a blue sofa" or "Blue ink drops into water and dispersions".

We don't know what caused this, but when we reduced the guidance scale parameter and the text condition, the generated video returned to normal. Can you provide an explanation or solution to avoid this when run a large number of prompts?

Expected behavior / 期待表现

Provide an explanation or solution to avoid empty videos when run a large number of prompts

whh258 avatar Aug 30 '24 04:08 whh258

There is a big problem, your prompt is too short. Please carefully read our readme. We need to use long prompts as input, which requires you to use large language models like GPT-4 / GLM-4 to polish and input long prompts. Otherwise, this is a part that the model has not been trained on

zRzRzRzRzRzRzR avatar Aug 30 '24 05:08 zRzRzRzRzRzRzR

There is a big problem, your prompt is too short. Please carefully read our readme. We need to use long prompts as input, which requires you to use large language models like GPT-4 / GLM-4 to polish and input long prompts. Otherwise, this is a part that the model has not been trained on

Our prompt:A person wears a white t-shirt and beige pants, holding a donut with pink icing and sprinkles. They bring the donut close to their mouth in several frames. The pink background contrasts with their white and beige clothing and the red-toned donut.

using demo code of huggingface model page: video = pipe( prompt=prompt, num_videos_per_prompt=1, num_inference_steps=50, num_frames=49, guidance_scale=6, generator=torch.Generator(device="cuda").manual_seed(42), ).frames[0]

yunkchen avatar Aug 30 '24 07:08 yunkchen

I experienced the same problem with prompts like these:

  • Drone flying through the Salar de Uyuni reflection, mirror-like salt flat, perspective shot, Toyota Land Cruiser in the distance, clear blue sky, fluffy white clouds, 8K resolution, Hasselblad X1D, 24mm lens, f/11 aperture, 1/30s shutter speed, ISO 400.
  • Flying over the Great Blue Hole, stunning underwater sinkhole, crystal-clear waters, coral reefs, school of fish swimming below, 6K resolution, GoPro Hero8, wide-angle lens, f/2.8 aperture, 1/125s shutter speed, ISO 100.
  • Drone flying through the Aurora Borealis, Northern Lights, vibrant green and purple colors, snow-covered mountains, frozen lake, 5K resolution, DJI Mavic 2 Pro, 24mm lens, f/2.8 aperture, 1/30s shutter speed, ISO 1600.

tin2tin avatar Aug 30 '24 08:08 tin2tin

I recommend increasing the num_inference_steps to 100, this works for my case.

bopan3 avatar Sep 10 '24 09:09 bopan3