Hrithick Sen comments

Repositories
Issues
Comments

Results 4 comments of


                                            Hrithick Sen

Input image size

CLIP's ViT-Large can take images of size 336. It's inevitable that there will be loss of information when downscale a 1024x1024 image to 336x336, but I think CLIP is robust...

How can I reduce dimension of image feature by encode_image?

It is not easy to increase/decrease the dimension of the image embedding without fine-tuning CLIP again. So, what we can do is attach a Projection layer at the end of...

What is the limit for text sequences?

The default length is 77, you might need to perform some more fine-tuning to accept larger text sequences. Below is how you can do it. ```python from transformers import (...

What is the limit for text sequences?

@chuyihuan If you are looking forward to fine-tuning CLIP then the snippet can be useful. yes, transformers is a library.