backprop Finetuning Image Text Vectorizer with CLIP

Finetuning Image Text Vectorizer with CLIP

Open singularity014 opened this issue 3 years ago • 1 comments

Hello, I tried finetuning Image-Text Vectorizer CLIP model using above approach. But I get stuck with the error -

Link to full code - Colab

What I need is something which gives cosine similarity between an image and a text, shall I finetune with triplet, or with cosine similarity? if its cosine similarity, then how will I get those cosine similarity?

The triplet variant takes text and image and gives one normalised vector, I am bit confused because I thought it would give a cosine similarity.

Oct 06 '21 03:10 singularity014

Hey,

Your code is not public so I can't see it. You should be able to finetune with either. Cosine similarity is just a function between two vectors.

See an example of it being used here: https://github.com/backprop-ai/backprop/blob/main/examples/ImageVectorisation.ipynb

Oct 28 '21 16:10 ojasaar

backprop backprop copied to clipboard

Finetuning Image Text Vectorizer with CLIP

backprop
backprop copied to clipboard