CogVideo icon indicating copy to clipboard operation
CogVideo copied to clipboard

Using latents in CogVideoXPipeline pipeline

Open loretoparisi opened this issue 5 months ago • 2 comments

Feature request / 功能建议

I'm trying to add latents

latents (torch.FloatTensor, optional) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator.

to the pipeline from encoded frames via the vae encoder example:

encoded_frames = encode_video(model_path, image_path, dtype, device)
 video = pipe(
        prompt=prompt,
        num_videos_per_prompt=num_videos_per_prompt,
        num_inference_steps=num_inference_steps,
        num_frames=num_frames,
        use_dynamic_cfg=True,
        guidance_scale=guidance_scale,
        output_type=output_type,
        generator=torch.Generator(device=device).manual_seed(seed),
        latents=encoded_frames
    )

but I'm facing a dimensionality error

Given groups=1, weight of size [3072, 16, 2, 2], expected input[32, 2, 80, 80] to have 16 channels, but got 2 channels instead

Motivation / 动机

add support to latents parameter in the CogVideoXPipeline pipeline

Your contribution / 您的贡献

Tested VAE Image encoding/decoding https://github.com/THUDM/CogVideo/issues/249

loretoparisi avatar Sep 06 '24 20:09 loretoparisi