Junming Yang
Junming Yang
You can run with `--data MMBench_DEV_EN` or `--data MMBench_TEST_EN` to predict the MMBench result. MMBench_DEV benchmark can be evaluated on your own device. MMBench_TEST needs to be submitted to the...
Please run with the command MMMU_DEV_VAL. It includes the results of validation and test. We list the dataset name and corresponding command in the readme file.
在 .env 设置 `LMUData=` 文件夹路径
You can add an deepseek evaluator in your configs: in path `src/alpaca_eval/evaluators_configs/deepseek_v3_eval/configs.yaml` ``` deepseek_v3_eval: prompt_template: "alpaca_eval_clf_gpt4_turbo/alpaca_eval_clf.txt" fn_completions: "openai_completions" completions_kwargs: model_name: "deepseek-v3-241226" # write to your corresponding model name max_tokens: 1...
The same issue. Have you been able to solve it?
Hi, @Shijinghang. What error do you meet? I tried to reproduce it but failed (model: InternVL2-1B). Can you provide more information?
The ChartQA_TEST benchmark includes both the ChartQA-H and ChartQA-M with 2,500 questions.
@lucasjinreal Which model and dataset do you evaluate? Can you provide more information? This may help us find out the problem.
InternVL2 uses `transformers == 4.37.0`. Please check your package version. InternVL2-76B is under testing. We will officially support and update it later in the `config.py` file.
I tried to reproduce this bug using the latest code and ran the same command. It appears to be working normally.