Hrithick Sen

Results 4 comments of Hrithick Sen

CLIP's ViT-Large can take images of size 336. It's inevitable that there will be loss of information when downscale a 1024x1024 image to 336x336, but I think CLIP is robust...

It is not easy to increase/decrease the dimension of the image embedding without fine-tuning CLIP again. So, what we can do is attach a Projection layer at the end of...

The default length is 77, you might need to perform some more fine-tuning to accept larger text sequences. Below is how you can do it. ```python from transformers import (...

@chuyihuan If you are looking forward to fine-tuning CLIP then the snippet can be useful. yes, transformers is a library.