VideoGPT
VideoGPT copied to clipboard
Next frame predictor
Hi, great work! I have a somewhat naive doubt.
Just curious, why not limit VQVAE to model space instead of space-time? Thus, exploiting the autoregressive nature of the transformer to generate videos with varying frames.