LLaVA icon indicating copy to clipboard operation
LLaVA copied to clipboard

added evaluation for multiple images

Open KansaiTraining opened this issue 5 months ago • 0 comments

Currently there is a function eval_model(args) in run_llava.py that permit us to query a single image (although theoretically image_parser return us a list and process_images also receives a list). The problem is that process_images (in mm_utils.py) makes a new list new_images but then due to line 181 of the code new_images = torch.stack(new_images, dim=0) produce that only the first image is processed.

Therefore I wrote the function eval_multiple that permits the evaluation of multiple images. For example we use it like this

args = type('Args', (), {
    "model_path": model_path,
    "model_base": None,
    "model_name": get_model_name_from_path(model_path),
    "query": prompt,
    "conv_mode": None,
    "image_file": image_files,  #<--this actually a list
    "sep": ",",
    "temperature": 0,
    "top_p": None,
    "num_beams": 1,
    "max_new_tokens": 512
})()
response = eval_multiple(args)

data = {"image_file";[],"description":[]}
for image_file,description in response:
    data['image_file'].append(image_file)
    data['description'].append(description)

I also add a function to conversation.py that permit us get a copy of the conversation, so that we can modify the system prompt of the conversation if we wish.

KansaiTraining avatar Sep 21 '24 03:09 KansaiTraining