CogVideo Someone made LCM sampler only 10 steps can you add it to demo page and pipe?

Feature request / 功能建议

His results are great : https://www.reddit.com/r/StableDiffusion/comments/1fwzaw9/cogvideo_i2v_working_with_lcm_with_only_10_steps/?sort=new

if you could add to demo here would be amazing

https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space

Oct 06 '24 12:10 FurkanGozukara

I check the link and there is no model weight?

Oct 06 '24 21:10 jzhang38

I check the link and there is no model weight?

i think it uses CogVideoX 5b image to video

Oct 06 '24 22:10 FurkanGozukara

I think it requires some effort of finetuning?

Oct 07 '24 07:10 foreverpiano

This is mostly misunderstanding as the reddit poster seems to have confused what LCM actually is, all they did was change the sampler, there's no distillation involved. I have noticed that the I2V model innately performs decently well at lower steps, LCM sampler or not.

Oct 07 '24 07:10 kijai

@kijai But I try to use step=20 steps. It could not give satisfied output. The output is full of white.

python cli_demo.py --prompt "A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer." \
    --model_path "/workspace/data/CogVideoX-5b" \
    --generate_type "t2v" \
    --num_inference_steps 20 \
    --guidance_scale 7.5

Oct 07 '24 12:10 foreverpiano

I think there are some tricks to reduce the steps.

Oct 07 '24 12:10 foreverpiano

@kijai But I try to use step=20 steps. It could not give satisfied output. The output is full of white.

python cli_demo.py --prompt "A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer." \
    --model_path "/workspace/data/CogVideoX-5b" \
    --generate_type "t2v" \
    --num_inference_steps 20 \
    --guidance_scale 7.5

same here

low steps doesn't produce good results

Oct 07 '24 12:10 FurkanGozukara

@kijai But I try to use step=20 steps. It could not give satisfied output. The output is full of white.

python cli_demo.py --prompt "A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer." \
    --model_path "/workspace/data/CogVideoX-5b" \
    --generate_type "t2v" \
    --num_inference_steps 20 \
    --guidance_scale 7.5

It won't work for text2video, from what I've seen under 32 steps usually produces full white outputs. image2video is different and works with down to 7 steps.

Oct 07 '24 12:10 kijai

@kijai got it. It is true that <32 steps returns full white for t2v.

Oct 07 '24 14:10 foreverpiano

I agree that they just use fewer steps for i2v.

Oct 07 '24 14:10 foreverpiano