Giovanni Puccetti
Giovanni Puccetti
@rwightman @vturrisi this was the intention, because the embedding that makes sense to use for contrastive downstream tasks with coca is the one output by the pooler. The only detail...
@rwightman ah now I see, I made a mistake reading the paper, I thought it worked how I wrote it.
@rwightman ok, indeed I was thinking about it, I believe that two poolers or one with one extra query are the same, except for the shared linear layer inside MultiHeadAttention
> I don't see how they'd be equivalent with the softmax there... @rwightman maybe I am just in denial, however, each row of the attention is one query dot product...
@vturrisi I am planning on doing a PR that should improve huggingface integration and a few other changes, I will add in that as soon as I start working on...
Hi @John-Kieron, is this on the latest version?
@John-Kieron I can't manage to replicate this, can you share some more info, did you make any changes to the code?
Hi @vedantroy for the different special tokens I don´t know if there is a specific reason, for the exclamation marks, the reason they are there is that the tokenizer uses...
Sure, I will try and reuse as much as I can, for now it is mostly copied from the coca-pytorch repo, will probably ask for some help while I move...
@rom1504 I will reuse the visual_model from open_clip, however in coca-pytorch the transformer layer for the text model are different from the regular ones, feed_forward and attention are parallel, do...