CLIP icon indicating copy to clipboard operation
CLIP copied to clipboard

Embedding sequence of images into CLIP

Open justlike-prog opened this issue 2 years ago • 2 comments

Hi,

Is there a way to use CLIP to embed whole albums of photos and then check similarity with certain phrases on what the album is about? I have a hard time to think about how I would encode a sequence of images into CLIP? Maybe there are some papers about it out there?

Thanks for any suggestion!

justlike-prog avatar Dec 27 '22 15:12 justlike-prog

I'd use the CLIP image encoder to get the image embeddings for each image, and then either:

  1. simply take the average of all those images and use that as the "album embedding"; or
  2. given an input text, calculate the cosine similarity with each of the image embedding, and check if the majority of them is higher than a certain value.

jongwook avatar Jan 09 '23 09:01 jongwook

Hi,image_features = model.encode_image(image) and the output of encode_image is a tensor of dimension 512. Do you know how to control dimension of output? eg, I want 128 or 64 dimension Many thanks.

sdalinluo avatar Feb 23 '23 10:02 sdalinluo