PoseGPT
PoseGPT copied to clipboard
Clarification on obtaining the embedding related to the <POSE> token
Hello! First of all, thank you for the great article. I have a question about how you obtain the embedding related to the <POSE> token, which is then projected and used for human pose reconstruction. If I understand correctly, when the model outputs a <POSE> token, you take the logits from the last layer of the LLM (on which softmax was applied and from the resulting distribution the <POSE> token was sampled) and use those as embeddings?
I think it's the last-layer embedding(hidden_states, before logits) corresponding to the <POSE> token. You can reference LISA https://github.com/dvlab-research/LISA.