Junming Yang comments

Results 31 comments of


                                            Junming Yang

Difference between MMBench_DEV and MMBench_Test

You can run with `--data MMBench_DEV_EN` or `--data MMBench_TEST_EN` to predict the MMBench result. MMBench_DEV benchmark can be evaluated on your own device. MMBench_TEST needs to be submitted to the...

[Question] Dataset not officially supported ?

Please run with the command MMMU_DEV_VAL. It includes the results of validation and test. We list the dataset name and corresponding command in the readme file.

如何更改数据下载的缓存目录

在 .env 设置 `LMUData=` 文件夹路径

Can you realize deepseek as a judge?

You can add an deepseek evaluator in your configs: in path `src/alpaca_eval/evaluators_configs/deepseek_v3_eval/configs.yaml` ``` deepseek_v3_eval: prompt_template: "alpaca_eval_clf_gpt4_turbo/alpaca_eval_clf.txt" fn_completions: "openai_completions" completions_kwargs: model_name: "deepseek-v3-241226" # write to your corresponding model name max_tokens: 1...

what is recommended to use a custom reward model?

The same issue. Have you been able to solve it?

Suspected error?

Hi, @Shijinghang. What error do you meet? I tried to reproduce it but failed (model: InternVL2-1B). Can you provide more information?

ChartQA evaluation

The ChartQA_TEST benchmark includes both the ChartQA-H and ChartQA-M with 2,500 questions.

Does the generating params changed?

@lucasjinreal Which model and dataset do you evaluate? Can you provide more information? This may help us find out the problem.

InternVL2 Error

InternVL2 uses `transformers == 4.37.0`. Please check your package version. InternVL2-76B is under testing. We will officially support and update it later in the `config.py` file.

InternVL2 Error

I tried to reproduce this bug using the latest code and ran the same command. It appears to be working normally.