vq-vae-2-pytorch icon indicating copy to clipboard operation
vq-vae-2-pytorch copied to clipboard

Video Prediction Results

Open pravn opened this issue 4 years ago • 1 comments

@rosinality I am wondering if we could generate the results in section 4.4. We should create sequential context from latent frames, so we need a scheme to process latent frames - basically something like a recurrent seq2seq model.

https://arxiv.org/pdf/1711.00937.pdf

  1. Store discrete latent space.
  2. Create a pixelcnn/snail encoder (can do it with same setup as pixelcnn prior in code).
  3. Process each frame with pixelsnail and use last frame's output as context.
  4. Use an autoregressive or recurrent scheme to process context for each frame.
  5. Decode new frames after creating context from input frames.

pravn avatar Jan 20 '21 16:01 pravn

Yes, I think you can do like the way you specified.

rosinality avatar Jan 21 '21 00:01 rosinality