LLaVA
LLaVA copied to clipboard
added evaluation for multiple images
Currently there is a function eval_model(args)
in run_llava.py
that permit us to query a single image (although theoretically image_parser
return us a list and process_images
also receives a list). The problem is that process_images
(in mm_utils.py
) makes a new list new_images
but then due to line 181 of the code new_images = torch.stack(new_images, dim=0)
produce that only the first image is processed.
Therefore I wrote the function eval_multiple
that permits the evaluation of multiple images.
For example we use it like this
args = type('Args', (), {
"model_path": model_path,
"model_base": None,
"model_name": get_model_name_from_path(model_path),
"query": prompt,
"conv_mode": None,
"image_file": image_files, #<--this actually a list
"sep": ",",
"temperature": 0,
"top_p": None,
"num_beams": 1,
"max_new_tokens": 512
})()
response = eval_multiple(args)
data = {"image_file";[],"description":[]}
for image_file,description in response:
data['image_file'].append(image_file)
data['description'].append(description)
I also add a function to conversation.py
that permit us get a copy of the conversation, so that we can modify the system prompt of the conversation if we wish.