CogVideo Support Style reference, Seed image and Control parameters.

Feature request / 功能建议

Any plan to add support to style reference, seed image and control parameters?

Motivation / 动机

This feature has been speculated in this article:

Input Processing:
- Text Input: The model accepts textual descriptions as input, likely utilizing advanced tokenization and embedding techniques to convert text into a format suitable for the neural network. This might involve a vocabulary size of 30,000 to 50,000 tokens, with each token embedded into a high-dimensional space (e.g., 768 or 1024 dimensions).
- Potential Additional Inputs: While not confirmed, the model might also accept additional inputs such as style references, seed images, or control parameters to guide the video generation process.

Other T2V and T2I models have extensive support to style, image and control parameters yet.

Your contribution / 您的贡献

Sep 06 '24 18:09 loretoparisi

You can set the seed, but there's no way to set the style because the training process hasn't been enhanced in this aspect. We will try to improve this in the future. For seed settings, please refer to cli_demo.

Sep 07 '24 02:09 zRzRzRzRzRzRzR

You can set the seed, but there's no way to set the style because the training process hasn't been enhanced in this aspect. We will try to improve this in the future. For seed settings, please refer to cli_demo.

Thank you, in the cli_demo I only see the generator to seed the rnd, but not a seed image

video = pipe(
        prompt=prompt,
        num_videos_per_prompt=num_videos_per_prompt,  # Number of videos to generate per prompt
        num_inference_steps=num_inference_steps,  # Number of inference steps
        num_frames=49,  # Number of frames to generate，changed to 49 for diffusers version `0.31.0` and after.
        use_dynamic_cfg=True,  ## This id used for DPM Sechduler, for DDIM scheduler, it should be False
        guidance_scale=guidance_scale,  # Guidance scale for classifier-free guidance, can set to 7 for DPM scheduler
        generator=torch.Generator().manual_seed(42),  # Set the seed for reproducibility
    ).frames[0]

Sep 08 '24 08:09 loretoparisi

Oh, the image is indeed generated directly from noise. We didn't write the content to control this part, so the impact should be very, very small

Sep 10 '24 00:09 zRzRzRzRzRzRzR

the

ah correct, so in any case you mean you would need to add a image parameter as for the Image2Image Diffusers's pipeline here https://huggingface.co/docs/diffusers/v0.30.2/en/api/pipelines/auto_pipeline#diffusers.AutoPipelineForImage2Image

Sep 10 '24 13:09 loretoparisi

CogVideo CogVideo copied to clipboard

Support Style reference, Seed image and Control parameters.

Feature request / 功能建议

Motivation / 动机

Your contribution / 您的贡献

CogVideo
CogVideo copied to clipboard