LLaVA-NeXT icon indicating copy to clipboard operation
LLaVA-NeXT copied to clipboard

about data processing

Open starhiking opened this issue 1 year ago • 1 comments

Is the grounding data labeled before or after the data processor? image

for example, an image needs padding to make width equal height, and the openai/clip-vit contains the do_center_crop. These operations will change the grounding location. But the labels in conversations remain the same. So I wonder if it will produce an impact on training.

starhiking avatar Sep 11 '24 12:09 starhiking

hi, May I ask where this grounding dataset comes from? Also, does llava onevision have any special tokens for bboxes, or is there any preprocessing func for box? I didn't see it in the code.

zyandtom avatar Jan 17 '25 06:01 zyandtom