vq-vae-2-pytorch
vq-vae-2-pytorch copied to clipboard
Video Prediction Results
@rosinality I am wondering if we could generate the results in section 4.4. We should create sequential context from latent frames, so we need a scheme to process latent frames - basically something like a recurrent seq2seq model.
https://arxiv.org/pdf/1711.00937.pdf
- Store discrete latent space.
- Create a pixelcnn/snail encoder (can do it with same setup as pixelcnn prior in code).
- Process each frame with pixelsnail and use last frame's output as context.
- Use an autoregressive or recurrent scheme to process context for each frame.
- Decode new frames after creating context from input frames.
Yes, I think you can do like the way you specified.