VILA icon indicating copy to clipboard operation
VILA copied to clipboard

Hi, Have you compare with s2 [384, 768] scales versus interpolate to 768x768?

Open OpenJarvisAI opened this issue 1 year ago • 6 comments
trafficstars

The way you using actually feed 5 images into vit,

how's it compare with interpolate to 768x768 which equal to send 4 images into vit but with different manner?

OpenJarvisAI avatar May 07 '24 14:05 OpenJarvisAI

Good point. In the paper we compare s2 versus directly extracting features from larger image without splitting (Table 12), and it turns out it's much more inefficient and has worse performance than s2.

bfshi avatar May 07 '24 21:05 bfshi

Hi, looks like it compare on segmentation task, does LLava task compared? Also, whats the most effect way to reduce the final tokens if using s2?

lucasjinreal avatar May 10 '24 03:05 lucasjinreal

Hi, yeah the paper only compares on segmentation. S2 uses avg pooling to resize the large-scale feature map to the regular size. To further reduce number of tokens, you can use mlp_downsample here

bfshi avatar May 15 '24 06:05 bfshi

I saw the code just have a mlp_downsample, the vit outputs doesn't changed, does the avg pooling you mentioned is mlp_downrsampler?

Is it the specificaly flat_square mentioned for avg_pool?

lucasjinreal avatar May 15 '24 11:05 lucasjinreal

Hi, mlp_downsample will concat the adjacent 2x2 tokens into a single token. For the avg pooling, it's implemented inside S2. S2 will pool the feature map of a large-scale image into a smaller size that corresponds to a regular-size image. See the code here

bfshi avatar May 15 '24 20:05 bfshi

@bfshi Hi, does it means, in S2, if input slices are [1x, 2x, 3x], then just the 2x and 3x will interpolate to 1x to get a normal output size? But from I can saw, the 2x and 3x just the batch size bigger, the input resolution acutally same as oriignal

lucasjinreal avatar May 16 '24 02:05 lucasjinreal

the issue has been non-active for a while. Feel free to reopen if the issue still exists

Lyken17 avatar Feb 25 '25 09:02 Lyken17