CLIP icon indicating copy to clipboard operation
CLIP copied to clipboard

Question: Image to Token(s) encoder?

Open chi0tzp opened this issue 2 years ago • 2 comments

Hi, I'm interested in training an encoder that maps images to the token space.

Maybe something like a ResNet backbone that learns to map input images to token embeddings. The latter, after passed through the CLIP text encoder should lead to image embeddings close to the image embeddings produced by the CLIP image encoder.

Has anyone tried anything like this?

Thank you!

chi0tzp avatar Dec 23 '21 11:12 chi0tzp

Are you familiar with CoOp? It does something similar:

https://github.com/KaiyangZhou/CoOp

Rijgersberg avatar Dec 23 '21 11:12 Rijgersberg

Hi @Rijgersberg, I wasn't aware of CoOp, many thanks for sharing!

chi0tzp avatar Dec 23 '21 12:12 chi0tzp