Stable-Pix2Seq Extract token embedding

Extract token embedding

Open elituan opened this issue 2 years ago • 0 comments

I want to extract the token embedding as shown in figure 11 of the paper.

However, when looking at the code, I see that the tokens are predicted by feeding the output feature map to a mlp whose last layer's dimension is 2003 (maybe number of tokens). Hence, the model do not learn the token embedding actually and we can't get the learned token embedding.

Am I missing something ?

Sep 26 '22 14:09 elituan

Stable-Pix2Seq Stable-Pix2Seq copied to clipboard

Extract token embedding

Stable-Pix2Seq
Stable-Pix2Seq copied to clipboard