Wenyi Hong comments

Results 8 comments of


                                            Wenyi Hong

About 3D Swin Attention

Hi, different attention channels are calculated independently, and are added up later in the unit of tokens instead of patches. As mentioned in sec 3.2 in our paper, the temporal...

您好，CogVideo初始生成的帧的分辨率是160*160，super-resolution可以把其超分到480*480。由于CogVideo使用的VQ-VAE解码器与CogView2相同，因此直接使用了CogView2的超分方法。具体可以参考 Ding, M., Zheng, W., Hong, W., & Tang, J. (2022). CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers. arXiv preprint arXiv:2204.14217.

Wenyi Hong

About 3D Swin Attention

super-resolution

Computation Requirement to train CogVideo

How many frames (seconds) are there in each video sample used in the training process?

Weird training results

'RuntimeError: CUDA out of memory.' when use RTX3080

Computation Requirement to train CogVideo

Computation Requirement to train CogVideo