CogVideo Problematic result

System Info / 系統信息

When I input image:

I got results like this:

I run by: python inference/cli_demo.py --prompt "A majestic brown bear, its fur rich and textured, stands motionless in a lush, green forest. The scene is serene, with towering trees and dappled sunlight filtering through the leaves. Suddenly, the bear's head slowly turns towards the camera, its eyes locking gaze with the viewer, exuding a powerful, almost sentient presence. The rest of its body remains perfectly still, rooted to the spot, \ while the background, a tapestry of vibrant green foliage and earthy browns, remains static, emphasizing the bear's deliberate, captivating movement." --generate_type "i2v" \ --image_or_video_path "/home/user/ssd/data/swapping_data/bear/images/frame_00001.jpg" --output_path "/home/user/hdd/CogVideo/results.mp4" --model_path THUDM/CogVideoX-5b-I2V

The prompt is generated by your script, original one is: "the bear turn head towards camera, body keeps static, background keeps static"

Information / 问题信息

[ ] The official example scripts / 官方的示例脚本
[ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

python inference/cli_demo.py --prompt "A majestic brown bear, its fur rich and textured, stands motionless in a lush, green forest. The scene is serene, with towering trees and dappled sunlight filtering through the leaves. Suddenly, the bear's head slowly turns towards the camera, its eyes locking gaze with the viewer, exuding a powerful, almost sentient presence. The rest of its body remains perfectly still, rooted to the spot, \ while the background, a tapestry of vibrant green foliage and earthy browns, remains static, emphasizing the bear's deliberate, captivating movement." --generate_type "i2v" \ --image_or_video_path "/home/user/ssd/data/swapping_data/bear/images/frame_00001.jpg" --output_path "/home/user/hdd/CogVideo/results.mp4" --model_path THUDM/CogVideoX-5b-I2V

Expected behavior / 期待表现

At least it is a video

Mar 26 '25 06:03 massyzs

@massyzs Hi! I met the same problem when I used the CogVIdeoX-5B-I2V to generate videos. Have you solved this problem yet? Can you provide your solution?

Mar 28 '25 01:03 LazySheeeeeep

I'm not entirely sure why you're encountering this issue on your end. It's recommended to first check if the diffusers version is updated to the latest. If the problem persists, it's advisable to use the cogkit for inference.

Mar 28 '25 05:03 OleehyO

I found that the cli_demo.py has a bug, it defaults the fps and num_frames to 81 and 16fps when not set, however CogVIdeoX-5B-I2V's param is 49 frames and 8 fps, change it to the correct value and the problem solved

Apr 07 '25 14:04 web3-luoxi

Can CogVideoX-1.0 5B-I2V only be used when the num_frames is set to 49? Can it handle different num_frames parameters (8N + 1) like 25 or so? Currently changing the num_frames gives me a mosaiced output like the ones shown above. Does it make sense to use the pretrained model for controlnet post-training with a lower number of frames? Thanks!

Jun 24 '25 22:06 nnsriram27