The working mechanism of the classifier

Open zhangyupeng123 opened this issue 1 year ago • 3 comments

Dear author, thank you very much for your excellent work. I have a question that I would like to ask you. Is the classifier designed to calculate the cosine similarity between images and text in the same way as CLIP, or is it designed differently? I don't seem to have found detailed information on this part.

Aug 30 '24 06:08 zhangyupeng123

Hi there, thank you for your interest in our work. Yes, the classifier works in the same way as CLIP, i.e, the classifier weights are essentially composed of text embeddings.

Sep 01 '24 14:09 machuofan

When training, is the input on the text side the image's title, or is it just a template like "a photo of " or "a "?

Oct 09 '24 04:10 zhangyupeng123

@machuofan When training, is the input on the text side the image's title, or is it just a template like "a photo of " or "a "?

Oct 10 '24 03:10 zhangyupeng123

It's 'a xxx'.

Oct 14 '24 05:10 machuofan