CogVideo icon indicating copy to clipboard operation
CogVideo copied to clipboard

Question about the VAE upsampling

Open goddice opened this issue 5 months ago • 0 comments

Hi, I am trying to understand the logic of the CogVideoXUpsample3D. I found that it seems like, for tensors that have an odd t dimension, the first frame will be treated separately for spatial only. (https://github.com/huggingface/diffusers/blob/2b443a5d621bd65f5cbf854195aef29cedd24058/src/diffusers/models/upsampling.py#L386)

Can you explain what is the purpose of this? Are you trying to preserve the parity of the t dimension?

Thanks!

goddice avatar Sep 19 '24 21:09 goddice