VILA icon indicating copy to clipboard operation
VILA copied to clipboard

Potential bug in mm_utils.py process_image function

Open hubenjm opened this issue 1 year ago • 1 comments

When data_args.image_aspect_ratio = 'resize', it seems that mm_utils.process_image returns the image as a PIL.Image.Image data type, which has no shape attribute. See https://github.com/Efficient-Large-Model/VILA/blob/main/llava/mm_utils.py#L168

When doing stage 1 alignment training, we use the datasets.LazySupervisedDataset class, whose get_item function tries to call image.shape here: https://github.com/Efficient-Large-Model/VILA/blob/main/llava/data/dataset.py#L834

This crashes the training. So should we simply add the line image = processor.preprocess(image, return_tensors="pt")["pixel_values"][0] below line 168 of mm_utils.py: https://github.com/Efficient-Large-Model/VILA/blob/main/llava/mm_utils.py#L168 ?

hubenjm avatar May 09 '24 22:05 hubenjm

Seems valid, we will verify on our end and make the changes.