CLIP How does CLIP think without the prompts?

How does CLIP think without the prompts?

Open tanveer-sayyed opened this issue 3 years ago • 1 comments

Hi,

Thank you for this amazing contribution!

The central question that I want to raise is - without entering any prompts, what all words/phrases is CLIP thinking. So, I want to know if its possible if this model is able to give out all (text) outputs for a given image, and not just the ones queried by prompts. That is for an image the set of "all" words/phrases that the model has found (in decreasing order).

If its possible, how do I begin? Need some help here.

May 04 '22 06:05 tanveer-sayyed

It sounds like you're attempting to maximize the image embeddings by searching text embeddings.

Researchers did that here: https://distill.pub/2021/multimodal-neurons/ But people were running into issues trying to figure out how to reproduce it: https://github.com/openai/CLIP-featurevis/issues/2

May 07 '22 19:05 ProGamerGov

CLIP CLIP copied to clipboard

How does CLIP think without the prompts?

CLIP
CLIP copied to clipboard