Kim Seonghyeon
Kim Seonghyeon
Maybe you can do it by conditioning on previous frames. I don't know much about video generations, sorry.
I think you can change shape argument of PixelSNAIL.
Yes, perceptual loss will be easy to try. But I think you can get quite nice results with MSE loss alone.
I think it will be safer to use fp32 for entire quantize operations.
Yes. It may work.
If it is suffice to reproduct the result of fp32 training, definitely it would be nice to have.
Yes, it will generates sample of latent code for VQ-VAE. I checked it can make some samples if you train enough. But you will need to use a quite large...
 Not very nice, but it is from somewhat smaller model than the model in the paper.
##### Top * channel: 512 * n_block: 4 * n_res_block: 5 * res_channel: 512 * n_cond_res_block: 0 * n_out_res_block: 5 * attention: True * dropout: 0.1 * batch size: 63...
@k-eak I have used 4 V100s with mixed precision training.