CogVideo icon indicating copy to clipboard operation
CogVideo copied to clipboard

About Frame Pack & 3d Rope

Open burnquiet opened this issue 1 year ago • 4 comments

Feature request / 功能建议

Thanks for your work! The paper mentioned Frame Pack, which requires to generate attention masks (https://arxiv.org/abs/2307.06304), the forward function : kwargs["input_ids"] = kwargs["position_ids"] = kwargs["attention_mask"] = torch.ones((1, 1)).to(x.dtype) Confused about this mask generation... And the 3D Rope is not involved in inference?

Motivation / 动机

Understanding the codes

Your contribution / 您的贡献

burnquiet avatar Aug 07 '24 06:08 burnquiet

  1. This usage of mask is to use full attention in the sat framework
  2. Our 2B model does not use rope, and the subsequent models use rope

tengjiayan20 avatar Aug 07 '24 12:08 tengjiayan20

Are you releasing any subsequent model soon?

Does your current code include implementation of NaViT?

jinhuaca avatar Aug 08 '24 13:08 jinhuaca

It seems that, although the paper mentions NaViT, the open sourced dataloader does not contain relevant code sections:

https://github.com/THUDM/CogVideo/blob/main/sat/data_video.py

jinhuaca avatar Aug 08 '24 13:08 jinhuaca

  1. Just see section "update and news".
  2. Codes related to NaViT are only used for pretraining. We release codes for inference and finetuning, which don't need codes about NaViT. The release of these codes is a further plan.

tengjiayan20 avatar Aug 08 '24 16:08 tengjiayan20

Look forward to the release of the code regarding Navit.

colian avatar Aug 23 '24 02:08 colian

Look forward too!!

skeletonNN avatar Aug 23 '24 08:08 skeletonNN