CLIP Input image size

Input image size

Open MarkIsDoingIt opened this issue 1 year ago • 1 comments

I want to use a pretrained vision transformer from clip to extract feature from images. My original image size is 10241024. What is the largest input image size for any clip pretrained version? Resize the image to classical 224224 will have a loss of information. Thanks!

Jul 20 '23 20:07 MarkIsDoingIt

CLIP's ViT-Large can take images of size 336. It's inevitable that there will be loss of information when downscale a 1024x1024 image to 336x336, but I think CLIP is robust enough. Make sure you use the ViT-L which has the smallest patch size (14).

Jul 27 '23 20:07 hrithickcodes

CLIP CLIP copied to clipboard

Input image size

CLIP
CLIP copied to clipboard