LLaVA-NeXT Fix prepare inputs labels for multimodal

Fix prepare inputs labels for multimodal

Open khaimt opened this issue 1 year ago • 1 comments

Add assert to make sure number of images == number of image tokens in inputs
Fix the case where num_images == 0:
- We don't need to use image_features at here
- cannot set cur_image_idx += 1 --> will run into error for many cases. For example, if batch contains 2 data points without containing images in inputs

Jun 28 '24 15:06 khaimt

hello, I add the assert, then something error occured. I set the batch_size=1, and feed the model with the mixed dataset which has both image-exist samples and no-image samples. but when feeding the no image sample, the image tokens in input_ids is 0 ,this is correct,but the number of images in batch is 1. I can't find out why is it, can you give me some advise ? Thank you!

Jan 17 '25 11:01 shorlockhxk

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Fix prepare inputs labels for multimodal

LLaVA-NeXT
LLaVA-NeXT copied to clipboard