ML_Decoder
ML_Decoder copied to clipboard
What is the ideal query initializer?
Dear Sir,
I am working on transformer models for multi-label image classification, and your paper titled "ML-Decoder: Scalable and Versatile Classification Head" attracted my attention. However I couldn't understand one point in the article/code: In your code, random data is given from the query input and is set as non-learnable. What is the logic behind this? Generally, I saw that as queries some form of images/text is given (not randomly given). Is it reasonable to extract the relationship between image embeddings and random data with cross-attention? Could you tell me what the fundamental idea behind this is? I would be grateful if you could help me understand this issue.
Yours sincerely.