The inference on the AMBER dataset is very slow.
The AMBER dataset is a yes-or-no dataset, but the questions in this dataset do not explicitly state: "Please answer yes or no." As a result, during evaluation, the model generates very long responses, which slows down the entire inference process. Is it possible to address this issue?
Hi, thanks for pointing out the problem. I checked the the original AMBER dataset, they didn't provide such suffix like "Please answer yes or no.". For consistency, we did not use these suffix too.
If you want to generate shorter responses, you can modify the dataset file mannually or consider using custom prompt for the evaluated model.
That somehow makes sense, I can add this additional instruction to the test prompt of AMBER.
Hi, thanks for pointing out the problem. I checked the the original AMBER dataset, they didn't provide such suffix like "Please answer yes or no.". For consistency, we did not use these suffix too.
If you want to generate shorter responses, you can modify the dataset file mannually or consider using custom prompt for the evaluated model.
Maybe it can be handled another way? Such as let the max new token when evaluating model at amber benchmark be 1?
Hi, @pspdada . The problem has been resolved in https://github.com/open-compass/VLMEvalKit/pull/961.
Hi, thanks for pointing out the problem. I checked the the original AMBER dataset, they didn't provide such suffix like "Please answer yes or no.". For consistency, we did not use these suffix too. If you want to generate shorter responses, you can modify the dataset file mannually or consider using custom prompt for the evaluated model.
Maybe it can be handled another way? Such as let the max new token when evaluating model at amber benchmark be 1?
I do not recommend this alternative, since most API VLMs may not say Yes / No in the beginning of their responses.
That somehow makes sense, I can add this additional instruction to the test prompt of AMBER.
@kennymckormick I can help to do this