CogVideo icon indicating copy to clipboard operation
CogVideo copied to clipboard

Potential mistake in CogVideoX Paper's Figure 2(a) Regarding 3D VAE Sampling

Open rayleichenxi opened this issue 1 year ago • 3 comments

hi~ thank you for your opensource, it's a great work. I am currently reading the CogVideoX paper download from resources/ folder of this repo. I wounder, if there is a mistake in Figure 2 (a), as encoder of 3d VAE should compress the video with 2x DownSample instead of 2x Upsample, and similarly, the decoder should perform a 2x upsample?

image

rayleichenxi avatar Aug 06 '24 06:08 rayleichenxi

Thank you for your careful reminder for the typo, you are right, we will correct it.

tengjiayan20 avatar Aug 06 '24 06:08 tengjiayan20

截屏2024-08-07 18 01 56 Additionally, the sentence in the second paragraph of Section 2.1 of the paper (highlighted in blue) seems to be inconsistent with the description of Figure 2. Should "the first two rounds of downsampling and upsampling" correspond to the top two in the blue box and the bottom two in the yellow box?

kyrie111 avatar Aug 07 '24 10:08 kyrie111

截屏2024-08-07 18 01 56 Additionally, the sentence in the second paragraph of Section 2.1 of the paper (highlighted in blue) seems to be inconsistent with the description of Figure 2. Should "the first two rounds of downsampling and upsampling" correspond to the top two in the blue box and the bottom two in the yellow box?

You are right, we will update the wording in the official version of the paper to avoid ambiguity.

tengjiayan20 avatar Aug 07 '24 12:08 tengjiayan20