Raunak
Results
2
comments of
Raunak
AFAIK, the latent space of CLOOB seems to be aligning text and image modalities much better than CLIP. Below are two plots i saw someone post on EleutherAI's discord where...
My expectation is that in case of two tower setups, we might see better aligned embeddings. (I don't think this approach is meant for single tower setups) Other than that,...