clip.cpp icon indicating copy to clipboard operation
clip.cpp copied to clipboard

Support image-only

Open fire opened this issue 1 year ago • 8 comments

Use image only for scanning a image and finding its classes

fire avatar Jun 27 '23 21:06 fire

zsl (zero-shot labeling) examples does that. We need to pass candidate class names with --text argument and zsl is scoring those classes.

monatis avatar Jun 28 '23 08:06 monatis

I do not have a general list of candidates to provide

fire avatar Jun 28 '23 12:06 fire

If you don't have candidates, CLIP model won't work for image classification. Then I guess your best bet would be to hardcode class names from a common dataset such as OpenImages by modifying zsl example.

p.s.: I'm planning to experiment with a transfer learning method on the edge to train a single head layer on top of the CLIP backbone for image classification, but I don't know when yet.

monatis avatar Jun 28 '23 13:06 monatis

Can you go into more detail?

common dataset such as OpenImages

Like go to OpenImages and generate a text prompt with all its classes in to the command line prompt?

fire avatar Jun 29 '23 18:06 fire

Yes, or only the classes that you are actually expecting to appear in the image. This is how zero-shot labeling is supposed to work.

If you could describe your exact use case, I'd try to make a more detailed comment.

monatis avatar Jun 29 '23 18:06 monatis

Here’s an example. I want to create an discord bot that watches for a reaction on the image and posts a message explaining what the image looks like for people who have trouble seeing, but know what the words mean.

fire avatar Jun 29 '23 19:06 fire

sounds like you want to feed the image embedding to a llm. (like llava)

Green-Sky avatar Jun 29 '23 19:06 Green-Sky

Oh got it. This is called image captioning. There exist numerous models for it, but state-of-the-art results come from models like LLaVA. It is basically CLIP + LLaMA bridged with a linear layer. I have an idea like creating another project to combine clip.cpp with llama.cpp to achieve efficient inference of LLaVA, but this might be delayed for a week or so because there are some other features I'd like to implement in this repo before that.

monatis avatar Jun 29 '23 20:06 monatis