InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

Question: What is the minimum size of an image that can be classified?

Open ChangGiMoon opened this issue 1 year ago • 1 comments

I want to use InternVL2-8B to do binary classification (e.g. yes or no) on very small images. Specifically, I am going to use the cropped bounding box patch (which is the result of Object Detection) as input for InternVL2-8B and verify whether the class of bounding box is correct. Can you tell me the approximate minimum image size that can be classified? The prompt will use the following input. The input images have various sizes and appearances as shown below.

prompt: Based on the given image, answer the following question with 'yes' or 'no': Question: [Is there a person in this image?], Answer:

input image example: ex

ChangGiMoon avatar Sep 20 '24 02:09 ChangGiMoon

The minimum image size should be equal to or greater than the patch size, which is 14 by default.

qishisuren123 avatar Sep 21 '24 08:09 qishisuren123

The minimum image size should be equal to or greater than the patch size, which is 14 by default.

@qishisuren123 Hi, the small image will be scaled to multiple of input_size 448 in the preprocessing, right? Why >= patch size 14? Did I miss something?

heibaidaolx123 avatar Oct 22 '24 06:10 heibaidaolx123

Yes, the small image will be resized to multiple of 448 in the preprocessing.

czczup avatar Dec 09 '24 11:12 czczup