VLMEvalKit
VLMEvalKit copied to clipboard
How to use a locally deployed judge model, such as Qwen3-8B?
I followed the steps in the provided link to deploy Qwen3-8B locally as a judge model. However, when evaluating the MMBench_DEV_EN_V11 dataset, an error was thrown at line 263 in vlmeval/dataset/image_mcq.py:
assert model in ['chatgpt-0125', 'exact_matching', 'gpt-4-0125']
I noticed that many datasets only support chatgpt-0125 and gpt-4-0125 as judge models. If that’s the case, what is the purpose of the local judge model deployment tutorial in Quickstart.md? How can I actually use a locally deployed judge model?