vq-vae-2-pytorch
vq-vae-2-pytorch copied to clipboard
Conditioned Sample
In the pixel snail paper, it is able to generate a conditioned sample with some global condition h I am just wondering if it is able to do that in the current stage of implementation?
Sorry for late reply. Currently this implementation does not support class conditional generation. Some modification will be needed like injecting conditions in top pixelsnail networks.
Hello, if I use natural images for training, can I use your network to generate natural images(when running sample.py)? The second question is, Whether it's a viable thing to add conditions to your network if I want to generate certain types of natural images? Thanks.
Yes, you can use for that as pixelsnail itself was used for natural images. Also you can use conditions on it, and actually pixelsnail for bottom code is conditioned on top code. But if you want to use conditions like categories, maybe another approaches could be more appropriate. (This implementation uses spatial feature grids of top codes as conditions.)
Thank you for your reply!My idea is to use a signal (4000 dimension) as a condition, can I modify your network to achieve this?
Yes. You can try some methods to incorporate it as conditions.
Thanks! Do you mean that I can do the same thing as the bottom did at the top, and then splitter the conditions as vectors to the top,? And if so, do I need to modify the loss function? My goal is to generate the corresponding image just by giving a signal to the top layer as the condition during the test. Is this kind of sampling feasible?
There are many options. You can use conditioning mechanism in current implementation. You only need to transform your signal vector to spatial (2d, NCHW) feature maps. (You can simply use tiling tensor/nearest neighbor upsampling) You will need to replace https://github.com/rosinality/vq-vae-2-pytorch/blob/master/pixelsnail.py#L416 these lines that converts discrete codes to spatial feature maps to your code. I think this will work.
Also you can use the method that applies affine transforms on channel dimensions which is widely used in GANs. Maybe you can refer to FiLM(https://arxiv.org/abs/1709.07871), SNGAN(https://arxiv.org/abs/1802.05637)/BigGAN(https://arxiv.org/abs/1809.11096), StyleGAN (https://arxiv.org/abs/1812.04948). You will need more modifications, but it will be not very hard.
Thanks a lot! I will follow your suggestion and try it.
In case you'd want to generate an image based on a text description how would you condition the model? Would this be necessary only in the top layer of the PixelSnail or also in the VQVAE? Thank you.
@maan198 I think conditioning prior (PixelSNAIL) would be enough.
There are many options. You can use conditioning mechanism in current implementation. You only need to transform your signal vector to spatial (2d, NCHW) feature maps. (You can simply use tiling tensor/nearest neighbor upsampling) You will need to replace https://github.com/rosinality/vq-vae-2-pytorch/blob/master/pixelsnail.py#L416 these lines that converts discrete codes to spatial feature maps to your code. I think this will work.
Also you can use the method that applies affine transforms on channel dimensions which is widely used in GANs. Maybe you can refer to FiLM(https://arxiv.org/abs/1709.07871), SNGAN(https://arxiv.org/abs/1802.05637)/BigGAN(https://arxiv.org/abs/1809.11096), StyleGAN (https://arxiv.org/abs/1812.04948). You will need more modifications, but it will be not very hard.
Thanks for the details. Could you please elaborate more on the conditioning mechanism in the current implementation? if I want to condition on a source image, can I use the same mechanism on the top layer of pixelsnail? which is to condition samples from the source image to sampling top code.