recognize-anything icon indicating copy to clipboard operation
recognize-anything copied to clipboard

The size of tag_des is 51 in code, but not clearified in paper.

Open CZX-Yui opened this issue 1 year ago • 1 comments

Brilliant work~ I have a question about the detail in your code. I notice that the "LLM Tag Des" is consists of 50 sentences generated by chatGPT, which is mentioned in paper. And the "Hand-Written" prompt is "A photo of xxx". They are compared seperately. But in your code, it seems that these two prompt are concated together and each tag's embedding is (51, 512). Will this lead to a better performance?

image

CZX-Yui avatar Nov 16 '23 09:11 CZX-Yui

Hi, thanks for rising this. We added 'a photo of a {tag}' mainly to address the situation when there are no descriptions provided by LLM during inference.

xinyu1205 avatar Nov 17 '23 01:11 xinyu1205