MathVista issues

Add .devcontainer, update GPT to use OpenAI >1.x, make Claude and Bard imports dynamics and optional, use HuggingFace datasets

I had been working more closely with this repo a few weeks ago and thought I would try to contribute some of the modifications back for others to benefit. ##...

mattmazzola

Return None when str coercion fails or when empty

- Returning None when extraction is empty prevents choosing one of the choices based on Levenshtein distance - Also return None on str coercion failure since returning empty string would...

mattmazzola

Possible Bug in calculate_score.py, empty responses or extractions results in non-empty normalized extraction due to `get_most_similar`

I was debugging an issue with our model outputting empty responses for all questions and noticed the accuracy score was still 22% when I expected it should be 0%. I...

mattmazzola

What was the intention of `image_path` in the model files such as gpt.py?

The `get_response` function takes `image_path` but the variable is unused. I assumed it would be useful if targeting another LMM like GPT4V; however, the code to set the image path...

mattmazzola

Inefficient file write operations - writing entire results dictionary to output path in loops

More of an optimization rather than bug or issue with evaluation, but I think worth noting in case someone thinks it is worthy to address. generate_response.py and extract_answer.py use an...

mattmazzola

Redundant implementations of `get_chat_response`

There is an implementation in `utilities#get_chat_response` and `models/gpt#get_response`. These could be unified https://github.com/lupantech/MathVista/blob/82f68d09b4cbffe9d0dfd7542c599810e30c9a99/utilities.py#L159-L199 https://github.com/lupantech/MathVista/blob/82f68d09b4cbffe9d0dfd7542c599810e30c9a99/models/gpt.py#L16-L55

mattmazzola

Questions about GPT-4O score on MathVista?

1

Hi I was wondering the score of GPT-4O, it's 63.8 on testmini. But I could only get around 55 at my side. Also I got little bit lower score for...

Luodian

Update calculate_score.py

socre -> score

eltociear

prefetch_rate

Hello, could you please explain how prefetch_rate is calculated, what it represents, and what it can indicate?

wangchunhui06

Could anyone update on how to use extract answer

1

``` [18:38:19] INFO [root] MathVista: Extract Answers - Start usage: extract_answer.py [-h] [--results_file_path RESULTS_FILE_PATH] [--response_label RESPONSE_LABEL] [--max_num_problems MAX_NUM_PROBLEMS] [--quick_extract] [--rerun] [--save_every SAVE_EVERY] [--azure_openai_api_endpoint AZURE_OPENAI_API_ENDPOINT] [--azure_openai_api_key AZURE_OPENAI_API_KEY] [--azure_openai_api_version AZURE_OPENAI_API_VERSION] [--azure_openai_model AZURE_OPENAI_MODEL]...

YerongLi

MathVista
MathVista copied to clipboard

Metadata

Add .devcontainer, update GPT to use OpenAI >1.x, make Claude and Bard imports dynamics and optional, use HuggingFace datasets

Return None when str coercion fails or when empty

Possible Bug in calculate_score.py, empty responses or extractions results in non-empty normalized extraction due to `get_most_similar`

What was the intention of `image_path` in the model files such as gpt.py?

Inefficient file write operations - writing entire results dictionary to output path in loops

Redundant implementations of `get_chat_response`

Questions about GPT-4O score on MathVista?

Update calculate_score.py

prefetch_rate

Could anyone update on how to use extract answer

← Metadata

Owner

Metadata

MathVista MathVista copied to clipboard

Metadata

← Metadata

Owner

Metadata

MathVista
MathVista copied to clipboard