CLIP
CLIP copied to clipboard
Embedding sequence of images into CLIP
Hi,
Is there a way to use CLIP to embed whole albums of photos and then check similarity with certain phrases on what the album is about? I have a hard time to think about how I would encode a sequence of images into CLIP? Maybe there are some papers about it out there?
Thanks for any suggestion!
I'd use the CLIP image encoder to get the image embeddings for each image, and then either:
- simply take the average of all those images and use that as the "album embedding"; or
- given an input text, calculate the cosine similarity with each of the image embedding, and check if the majority of them is higher than a certain value.
Hi,image_features = model.encode_image(image) and the output of encode_image is a tensor of dimension 512. Do you know how to control dimension of output? eg, I want 128 or 64 dimension Many thanks.