vq-vae-2-pytorch icon indicating copy to clipboard operation
vq-vae-2-pytorch copied to clipboard

[Question] Which K (or number of embeddings) to choose for VQ?

Open jjmartinezro opened this issue 5 years ago • 6 comments
trafficstars

Thanks for sharing your work, it is very useful.

I can see in your code that you use K=512, as in the paper, but it is not very clear if that is the number of categories in the dataset or an arbitrary number? I am trying to choose the K value for my dataset, where the number of categories is unknown, and after trying with some lower and some higher values, even when the vqvae trains right and learns to encode/decode using the latent code, it is not able to replicate the image given the latent embedding for that very same image, or anything at all that resembles an image.

Regards:

Juanjo.

jjmartinezro avatar Aug 12 '20 19:08 jjmartinezro

It is rather a kind of arbitrary number, or you can think it as a number of patterns for patches or a kind of latent dimensions. But could you let me know the meaning of you cannot replicate the image given the latent embeddings? As VQ-VAE generated the image solely from latent codes, it should be able to generate images from latent codes.

rosinality avatar Aug 13 '20 00:08 rosinality

That is what I understood from their paper.

My issue is, once the encoder/decoder arquitecture has finished working, if I take one image from the dataset, encode it, and I get both the encoding and the latent embedding (the one that relates each patch to one of the discrete vectors), if I decode the encoding, the image gets reconstructed quite accurately, but if I embed the latent embedding with the Quantize.embed_code() function and then upsample and decode the result, I don't get anything resembling the image at all. I get a noisy image with vertical stripes as if the decoder was trying to find the image in there but failed.

jjmartinezro avatar Aug 13 '20 08:08 jjmartinezro

Could you let me know what code did you embeded with Quantize.embed_code()?

rosinality avatar Aug 13 '20 11:08 rosinality

Sorry if I was not clear enough.

In the architecture I am using there is not top and bottom layer, but just the bottom layer (as if it was VQ-VAE instead of VQ-VAE2), being 256x256 images. I train the model, and then I run extract_code.py, storing the indexes for that only layer, what in your code is:

_, _, _, id_t, id_b = model.encode(img)

but in my case using only one id. That goes to the db, and then that id for each image is the one that I use in Quantize.embed_code(), returning me a code which then I do the upsample on, and decode, to get the image.

My issue is, if by doing this:

quant, id = model.extract_code(*data)

and then I upsample and decode quant, I do get the image. But if I take id and do:

Quantize.embed_code(id)

and then upsample that result and decode, I don't get the image or anything that makes sense.

Thanks for your help.

jjmartinezro avatar Aug 13 '20 11:08 jjmartinezro

Oh well, I just now realized I was missing the permutation to the embeded code before doing the upsample. It does work now, including sampling new images.

Thanks a lot for your time and help. Really appreciated it.

jjmartinezro avatar Aug 13 '20 11:08 jjmartinezro

Glad to hear that you solved the problem. 😁

rosinality avatar Aug 13 '20 11:08 rosinality