LLaVA-NeXT
LLaVA-NeXT copied to clipboard
Fix prepare inputs labels for multimodal
- Add assert to make sure number of images == number of image tokens in inputs
- Fix the case where num_images == 0:
- We don't need to use image_features at here
- cannot set
cur_image_idx += 1--> will run into error for many cases. For example, if batch contains 2 data points without containing images in inputs
hello, I add the assert, then something error occured. I set the batch_size=1, and feed the model with the mixed dataset which has both image-exist samples and no-image samples. but when feeding the no image sample, the image tokens in input_ids is 0 ,this is correct,but the number of images in batch is 1. I can't find out why is it, can you give me some advise ? Thank you!