What are the length (frames) of x^a and x^b when sampling autoregressively?

Open eri24816 opened this issue 3 years ago • 1 comments

My understanding is that since the model is trained on videos consisting of 16 frames, the summation of the length of x^a and x^b should be 16. But I'm not sure how long they are separately.

For example, we could first generate a video x^a ∼ p_θ(x) consisting of 16 frames, and then extend it with a second sample x^b ∼ p_θ(x^b|x^a)

Sep 30 '22 02:09 eri24816