video-diffusion-pytorch
video-diffusion-pytorch copied to clipboard
What are the length (frames) of x^a and x^b when sampling autoregressively?
What are the length (frames) of x^a and x^b when sampling autoregressively?
My understanding is that since the model is trained on videos consisting of 16 frames, the summation of the length of x^a and x^b should be 16. But I'm not sure how long they are separately.
For example, we could first generate a video x^a ∼ p_θ(x) consisting of 16 frames, and then extend it with a second sample x^b ∼ p_θ(x^b|x^a)