MathVista
MathVista copied to clipboard
Possible Bug in calculate_score.py, empty responses or extractions results in non-empty normalized extraction due to `get_most_similar`
I was debugging an issue with our model outputting empty responses for all questions and noticed the accuracy score was still 22% when I expected it should be 0%.
I debug further and found that for multi_choice
questions there is a path that computes Levenshtein distance but doesn't guard against empty inputs meaning it would output a valid choice regardless.
(Likely the choice with the least amount of characters which would be the minimum edit distance, or first choice if all equal length)
https://github.com/lupantech/MathVista/blob/82f68d09b4cbffe9d0dfd7542c599810e30c9a99/evaluation/calculate_score.py#L45-L51
https://github.com/lupantech/MathVista/blob/82f68d09b4cbffe9d0dfd7542c599810e30c9a99/evaluation/calculate_score.py#L14-L20
I also saw there was a questionable Exception handling when coercing the input value to a string. It assigns an empty string and continues. I think it should exiting early and return None. This assignment of empty string could further contribute to the issue above, for multiple choice problems where the extraction is not a string
https://github.com/lupantech/MathVista/blob/82f68d09b4cbffe9d0dfd7542c599810e30c9a99/evaluation/calculate_score.py#L30-L36
Video Demonstration
https://youtu.be/vj07WRvcLDw