CoDet icon indicating copy to clipboard operation
CoDet copied to clipboard

The working mechanism of the classifier

Open zhangyupeng123 opened this issue 1 year ago • 3 comments

Dear author, thank you very much for your excellent work. I have a question that I would like to ask you. Is the classifier designed to calculate the cosine similarity between images and text in the same way as CLIP, or is it designed differently? I don't seem to have found detailed information on this part.

zhangyupeng123 avatar Aug 30 '24 06:08 zhangyupeng123

Hi there, thank you for your interest in our work. Yes, the classifier works in the same way as CLIP, i.e, the classifier weights are essentially composed of text embeddings.

machuofan avatar Sep 01 '24 14:09 machuofan

When training, is the input on the text side the image's title, or is it just a template like "a photo of " or "a "?

zhangyupeng123 avatar Oct 09 '24 04:10 zhangyupeng123

@machuofan When training, is the input on the text side the image's title, or is it just a template like "a photo of " or "a "?

zhangyupeng123 avatar Oct 10 '24 03:10 zhangyupeng123

It's 'a xxx'.

machuofan avatar Oct 14 '24 05:10 machuofan