LLaVA icon indicating copy to clipboard operation
LLaVA copied to clipboard

[bug] It seems that LLaVA/llava/train/train.py has not been updated for v1.6

Open feiliya333 opened this issue 1 year ago • 4 comments

Describe the issue

For example, in the current LLaVA/llava/train/train.py, there is no image_grid_pinpoints parameter in the model_args, which will be used in llava/mm_utils.py, line176: image = process_anyres_image(image, image_processor, model_cfg.image_grid_pinpoints)

feiliya333 avatar Feb 01 '24 16:02 feiliya333

I guess possible_resolutions = [(2,2), (1,2), (2,1), (1,3), (3,1), (1,4), (4,1)] possible_resolutions = [(x * 336, y * 336) for x,y in possible_resolutions] image_grid_pinpoints = possible_resolutions

jayyoung0802 avatar Feb 02 '24 02:02 jayyoung0802

Is the code suggested by @jayyoung0802 meant for training all v1.6 models? Which model have you all found to be successfully training with this code?

gameveloster avatar Feb 02 '24 06:02 gameveloster

The current code base does not support training v1.6 models yet. We'll release the training code and data with v1.6 models soon. Thanks.

haotian-liu avatar Feb 03 '24 15:02 haotian-liu

I guess possible_resolutions = [(2,2), (1,2), (2,1), (1,3), (3,1), (1,4), (4,1)] possible_resolutions = [(x * 336, y * 336) for x,y in possible_resolutions] image_grid_pinpoints = possible_resolutions

I meet this problem again when reading the config of llava-1.6 models. It seems that not all of the mentioned resolutions are supported. Please refer to my newly raised question https://github.com/haotian-liu/LLaVA/issues/1644 for further information.

Forence1999 avatar Aug 04 '24 13:08 Forence1999