MathVista
MathVista copied to clipboard
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
I had been working more closely with this repo a few weeks ago and thought I would try to contribute some of the modifications back for others to benefit. ##...
- Returning None when extraction is empty prevents choosing one of the choices based on Levenshtein distance - Also return None on str coercion failure since returning empty string would...
I was debugging an issue with our model outputting empty responses for all questions and noticed the accuracy score was still 22% when I expected it should be 0%. I...
The `get_response` function takes `image_path` but the variable is unused. I assumed it would be useful if targeting another LMM like GPT4V; however, the code to set the image path...
More of an optimization rather than bug or issue with evaluation, but I think worth noting in case someone thinks it is worthy to address. generate_response.py and extract_answer.py use an...
There is an implementation in `utilities#get_chat_response` and `models/gpt#get_response`. These could be unified https://github.com/lupantech/MathVista/blob/82f68d09b4cbffe9d0dfd7542c599810e30c9a99/utilities.py#L159-L199 https://github.com/lupantech/MathVista/blob/82f68d09b4cbffe9d0dfd7542c599810e30c9a99/models/gpt.py#L16-L55
Hi I was wondering the score of GPT-4O, it's 63.8 on testmini. But I could only get around 55 at my side. Also I got little bit lower score for...
socre -> score