LLaVA-NeXT icon indicating copy to clipboard operation
LLaVA-NeXT copied to clipboard

Fix prepare inputs labels for multimodal

Open khaimt opened this issue 1 year ago • 1 comments

  • Add assert to make sure number of images == number of image tokens in inputs
  • Fix the case where num_images == 0:
    • We don't need to use image_features at here
    • cannot set cur_image_idx += 1 --> will run into error for many cases. For example, if batch contains 2 data points without containing images in inputs

khaimt avatar Jun 28 '24 15:06 khaimt

hello, I add the assert, then something error occured. I set the batch_size=1, and feed the model with the mixed dataset which has both image-exist samples and no-image samples. but when feeding the no image sample, the image tokens in input_ids is 0 ,this is correct,but the number of images in batch is 1. I can't find out why is it, can you give me some advise ? Thank you!

shorlockhxk avatar Jan 17 '25 11:01 shorlockhxk