LLaVA icon indicating copy to clipboard operation
LLaVA copied to clipboard

[Question] Misaligned image_grid_pinpoints

Open Forence1999 opened this issue 6 months ago • 0 comments

Question

Hi, Thank you so much for your great open-sourced work!

I notice that the config in https://huggingface.co/lmms-lab/llama3-llava-next-8b/blob/main/config.json shows the following values:

"image_grid_pinpoints": [ [ 336, 672 ], [ 672, 336 ], [ 672, 672 ], [ 1008, 336 ], [ 336, 1008 ] ],

but it seems different from what you stated in your post (https://llava-vl.github.io/blog/2024-01-30-llava-next/):

  • Increasing the input image resolution to 4x more pixels. This allows it to grasp more visual details. It supports three aspect ratios, up to 672x672, 336x1344, 1344x336 resolution.

Is there any misunderstanding or inattention? Thanks!

Forence1999 avatar Aug 04 '24 13:08 Forence1999