vector-quantize-pytorch icon indicating copy to clipboard operation
vector-quantize-pytorch copied to clipboard

RQ-VAE: How can I get a list of all learned codebook vectors (as indexed in the "indices")?

Open christophschuhmann opened this issue 2 years ago • 1 comments

Hi Lucid, i am working on quantizing CLIP image embeddings with your RQ-VAE. It works pretty well.

Next I want to take all learned codebook vectors and add them to the vocab of a GPT (as frozen token embeddings).

The idea is to train a GPT with CLIP image embeddings in between texts, e.g. IMAGE-CAPTION or TEXT-IMAGE-TEXT-IMAGE- ... Flamingo-style).

If this works, then GPT could maybe also learn to generate quantized CLIP IM embeddings token by token --> and then e.g. show images through a.) retrieval or b.) a DALLE 2 decoder :)

... So my question is: Once the RQ-VAE is trained and i can get the quantized reconstructions and indices - How can I get a list or tensor of the actual codebook? (all possible vectors from the rq-vocab) :)

christophschuhmann avatar Oct 03 '22 13:10 christophschuhmann

+1 I can reverse engineer the forward function, but it'd be nice if there was an easy function call I'm missing

Edit: ended up reverse engineering it anyways :-) You can do codes from indices like: quantizer.layers[i]._codebook.embed[0, tokens_ids[:, i]] for each layer i in the residual vector quantizer. As a bonus, you can reconstruct the input (image / audio / etc.) by doing:

decoded_vector = 0.0
for i, layer in enumerate(quantizer.layers):
    vector = vector + layer._codebook.embed[0, tokens[:, i]]

kradonneoh avatar Oct 07 '22 04:10 kradonneoh

@christophschuhmann @kradonneoh oh hey! nice to hear that the library is working well for your use case

I've added the feature to return all the codes across quantization layers here https://github.com/lucidrains/vector-quantize-pytorch/commit/ec2474608816a3752b13dc826a19b8966d98804e

lucidrains avatar Oct 26 '22 17:10 lucidrains