PoseGPT icon indicating copy to clipboard operation
PoseGPT copied to clipboard

Clarification on obtaining the embedding related to the <POSE> token

Open AndrejHafner opened this issue 1 year ago • 1 comments

Hello! First of all, thank you for the great article. I have a question about how you obtain the embedding related to the <POSE> token, which is then projected and used for human pose reconstruction. If I understand correctly, when the model outputs a <POSE> token, you take the logits from the last layer of the LLM (on which softmax was applied and from the resulting distribution the <POSE> token was sampled) and use those as embeddings?

AndrejHafner avatar Dec 26 '23 20:12 AndrejHafner

I think it's the last-layer embedding(hidden_states, before logits) corresponding to the <POSE> token. You can reference LISA https://github.com/dvlab-research/LISA.

JJJYmmm avatar Jan 24 '24 09:01 JJJYmmm