opencompass [Bug] Long text evaluation parameters are not clear

[Bug] Long text evaluation parameters are not clear

Open bullw opened this issue 10 months ago • 3 comments

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] The bug has not been fixed in the latest version.

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

python 3.10.1 OpenCompass 0.2.3 vllm 0.2.3

Reproduces the problem - code/configuration sample

configs/models/chatglm/vllm_chatglm2_6b_32k.py from opencompass.models import VLLM

models = [ dict( type=VLLM, abbr='chatglm2-6b-32k-vllm', path='THUDM/chatglm2-6b-32k', max_out_len=512, max_seq_len=4096, batch_size=32, generation_kwargs=dict(temperature=0), run_cfg=dict(num_gpus=1, num_procs=1), ) ]

Reproduces the problem - command or script

python run.py --model vllm_chatglm2_6b_32k --datasets longbench leval

Reproduces the problem - error message

The difference between the evaluation result parameters and the document long text evaluation is about 20 points, The score for the document can not be reproduced.

“max_seq_len、max_out_len” Should these two parameters be modified in any way?

Other information

No response

Apr 10 '24 12:04 bullw

For optimal performance, it is advisable to configure the max_seq_len parameter to the highest value feasible, such as 32768 or even higher if possible. As for the max_out_len, it typically comes with a preset default value within the dataset configuration. You have the option to adjust this to 256, or you may simply retain the default setting.

Apr 10 '24 14:04 liushz

Thank you very much. I reproduced most of the scores.

I also need to ask, indicators for rouge1, rouge2,rougeL,rougeLsum subset of the score difference is still very large.

What is the reason wow?
What are the indicators used in the rank?

Apr 12 '24 03:04 bullw

@liushz

Apr 12 '24 03:04 bullw

opencompass opencompass copied to clipboard

[Bug] Long text evaluation parameters are not clear

Prerequisite

Type

Environment

Reproduces the problem - code/configuration sample

Reproduces the problem - command or script

Reproduces the problem - error message

Other information

opencompass
opencompass copied to clipboard