Someone made LCM sampler only 10 steps can you add it to demo page and pipe?
Feature request / 功能建议
His results are great : https://www.reddit.com/r/StableDiffusion/comments/1fwzaw9/cogvideo_i2v_working_with_lcm_with_only_10_steps/?sort=new
if you could add to demo here would be amazing
https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space
I check the link and there is no model weight?
I check the link and there is no model weight?
i think it uses CogVideoX 5b image to video
I think it requires some effort of finetuning?
This is mostly misunderstanding as the reddit poster seems to have confused what LCM actually is, all they did was change the sampler, there's no distillation involved. I have noticed that the I2V model innately performs decently well at lower steps, LCM sampler or not.
@kijai But I try to use step=20 steps. It could not give satisfied output. The output is full of white.
python cli_demo.py --prompt "A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer." \
--model_path "/workspace/data/CogVideoX-5b" \
--generate_type "t2v" \
--num_inference_steps 20 \
--guidance_scale 7.5
I think there are some tricks to reduce the steps.
@kijai But I try to use step=20 steps. It could not give satisfied output. The output is full of white.
python cli_demo.py --prompt "A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer." \ --model_path "/workspace/data/CogVideoX-5b" \ --generate_type "t2v" \ --num_inference_steps 20 \ --guidance_scale 7.5
same here
low steps doesn't produce good results
@kijai But I try to use step=20 steps. It could not give satisfied output. The output is full of white.
python cli_demo.py --prompt "A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer." \ --model_path "/workspace/data/CogVideoX-5b" \ --generate_type "t2v" \ --num_inference_steps 20 \ --guidance_scale 7.5
It won't work for text2video, from what I've seen under 32 steps usually produces full white outputs. image2video is different and works with down to 7 steps.
@kijai got it. It is true that <32 steps returns full white for t2v.
I agree that they just use fewer steps for i2v.