CLIP
CLIP copied to clipboard
Accpetable Size of Images for CLIP
I wonder if CLIP accepts image with size smaller than 224 * 224 (e.g. 32 * 32). I noticed that an error regarding matrices dimentions will occur if I use 32 * 32 images as the input of CLIP without modifying the model.
Did you find a solution? I am reshaping all images to 224X224, but that seems a bit fishy, especially with the varying aspect ratio.
Just exploring. Still in exploring mode.
Best, N
Get Outlook for iOShttps://aka.ms/o0ukef
From: Till Aczél @.> Sent: Tuesday, May 31, 2022 4:39:08 PM To: openai/CLIP @.> Cc: Subscribed @.***> Subject: Re: [openai/CLIP] Accpetable Size of Images for CLIP (Issue #248)
Did you find a solution? I am reshaping all images to 224X224, but that seems a bit fishy, especially with the varying aspect ratio.
— Reply to this email directly, view it on GitHubhttps://github.com/openai/CLIP/issues/248#issuecomment-1142149306, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AYX372QQQFS4J7BYUITNRETVMYI7ZANCNFSM5XJAC74Q. You are receiving this because you are subscribed to this thread.Message ID: @.***>
I found my mistake, I forgot to preprocess the image
Did you find a solution? I am reshaping all images to 224X224, but that seems a bit fishy, especially with the varying aspect ratio. That was how I tackle with images with smaller size. I expect that by slightly modifying the model without retraining, CLIP could take in images with smaller sizes.
I haven't tried it, but probably you would have to pass a different argument to CLIPFeatureExtractor.
https://huggingface.co/docs/transformers/v4.19.2/en/model_doc/clip#transformers.CLIPFeatureExtractor
怎么把图片的分辨率从224改成32,因为我使用的是cifar10的数据集,他这个数据集就是32乘以32的分辨率。
You should upsample the image to (224, 224). ResNet performs 32x downsampling, and ViT also needs fixed-size input to patchify. So smaller images will cause problems here and there
Same problem, my server is too weak to handle 224x224 images