ML_Decoder icon indicating copy to clipboard operation
ML_Decoder copied to clipboard

What is the ideal query initializer?

Open osivaz61 opened this issue 1 year ago • 0 comments

Dear Sir,

I am working on transformer models for multi-label image classification, and your paper titled "ML-Decoder: Scalable and Versatile Classification Head" attracted my attention. However I couldn't understand one point in the article/code: In your code, random data is given from the query input and is set as non-learnable. What is the logic behind this? Generally, I saw that as queries some form of images/text is given (not randomly given). Is it reasonable to extract the relationship between image embeddings and random data with cross-attention? Could you tell me what the fundamental idea behind this is? I would be grateful if you could help me understand this issue.

Yours sincerely.

osivaz61 avatar Oct 02 '23 20:10 osivaz61