xmodaler
xmodaler copied to clipboard
Image to text search using clip
Hi, dear author, in your latest CVPR2022 paper (Comprehending and Ordering Semantics for Image Captioning), how to retrieve semantically similar sentences for the input image using clip model, can you give some tutorials? Thanks a lot!
You can refer to the openai github (https://github.com/openai/CLIP) for more details.
I see the vocabulary of the semantics label is in size of 907. Why is 907, and how can we get the word's meaning.
The semantics label file is uploaded to configs/image_caption/cosnet/semantics labels.txt