CogVideo
CogVideo copied to clipboard
Using latents in CogVideoXPipeline pipeline
Feature request / 功能建议
I'm trying to add latents
latents (torch.FloatTensor, optional) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator.
to the pipeline from encoded frames via the vae encoder example:
encoded_frames = encode_video(model_path, image_path, dtype, device)
video = pipe(
prompt=prompt,
num_videos_per_prompt=num_videos_per_prompt,
num_inference_steps=num_inference_steps,
num_frames=num_frames,
use_dynamic_cfg=True,
guidance_scale=guidance_scale,
output_type=output_type,
generator=torch.Generator(device=device).manual_seed(seed),
latents=encoded_frames
)
but I'm facing a dimensionality error
Given groups=1, weight of size [3072, 16, 2, 2], expected input[32, 2, 80, 80] to have 16 channels, but got 2 channels instead
Motivation / 动机
add support to latents
parameter in the CogVideoXPipeline pipeline
Your contribution / 您的贡献
Tested VAE Image encoding/decoding https://github.com/THUDM/CogVideo/issues/249