InternVL
InternVL copied to clipboard
I observed a difference between my GPT-5 evaluation results on the ERQA dataset and the ones reported.
Thank you for your work! I have a question regarding the GPT-5 evaluation on the ERQA dataset in the latest InternVL3.5 paper. The reported GPT-5 score is 65.7, which seems quite high — in my own evaluation, I obtained a score of 55.44. Could you please share the evaluation details and API settings you used? Did you modify the prompt? For reference, I evaluated GPT-5 using the original ERQA codebase.
Thank you for your interest in our work. We reported this score based on the information provided in their official blog.