VLMEvalKit The inference on the AMBER dataset is very slow.

The AMBER dataset is a yes-or-no dataset, but the questions in this dataset do not explicitly state: "Please answer yes or no." As a result, during evaluation, the model generates very long responses, which slows down the entire inference process. Is it possible to address this issue?

Apr 26 '25 16:04 pspdada

Hi, thanks for pointing out the problem. I checked the the original AMBER dataset, they didn't provide such suffix like "Please answer yes or no.". For consistency, we did not use these suffix too.

If you want to generate shorter responses, you can modify the dataset file mannually or consider using custom prompt for the evaluated model.

Apr 27 '25 08:04 MaoSong2022

That somehow makes sense, I can add this additional instruction to the test prompt of AMBER.

Apr 27 '25 12:04 kennymckormick

Hi, thanks for pointing out the problem. I checked the the original AMBER dataset, they didn't provide such suffix like "Please answer yes or no.". For consistency, we did not use these suffix too.

If you want to generate shorter responses, you can modify the dataset file mannually or consider using custom prompt for the evaluated model.

Maybe it can be handled another way? Such as let the max new token when evaluating model at amber benchmark be 1?

Apr 27 '25 12:04 pspdada

Hi, @pspdada . The problem has been resolved in https://github.com/open-compass/VLMEvalKit/pull/961.

Apr 27 '25 12:04 kennymckormick

Hi, thanks for pointing out the problem. I checked the the original AMBER dataset, they didn't provide such suffix like "Please answer yes or no.". For consistency, we did not use these suffix too. If you want to generate shorter responses, you can modify the dataset file mannually or consider using custom prompt for the evaluated model.

Maybe it can be handled another way? Such as let the max new token when evaluating model at amber benchmark be 1?

I do not recommend this alternative, since most API VLMs may not say Yes / No in the beginning of their responses.

Apr 27 '25 12:04 kennymckormick

That somehow makes sense, I can add this additional instruction to the test prompt of AMBER.

@kennymckormick I can help to do this

Apr 27 '25 12:04 MaoSong2022