VideoTetris Clarification of the arxiv paper content

Clarification of the arxiv paper content

Open DachengLi1 opened this issue 7 months ago • 0 comments

Thanks for the great work! Amazing results. I was looking at the arxiv paper and had some difficulty understand some key concepts. Could you kindly help me clarify?

I understand that (Section 3.1) given a written user prompt, you will decompose it spatially and locally, and compute attention score separately, and merge them. But I am not sure where is the autoregressive part. In this framework, how is the progressive following ability be achieved?

Thank you!

Jul 20 '24 21:07 DachengLi1

VideoTetris VideoTetris copied to clipboard

Clarification of the arxiv paper content

VideoTetris
VideoTetris copied to clipboard