recognize-anything
recognize-anything copied to clipboard
The size of tag_des is 51 in code, but not clearified in paper.
Brilliant work~ I have a question about the detail in your code. I notice that the "LLM Tag Des" is consists of 50 sentences generated by chatGPT, which is mentioned in paper. And the "Hand-Written" prompt is "A photo of xxx". They are compared seperately. But in your code, it seems that these two prompt are concated together and each tag's embedding is (51, 512). Will this lead to a better performance?
Hi, thanks for rising this. We added 'a photo of a {tag}' mainly to address the situation when there are no descriptions provided by LLM during inference.