opencompass icon indicating copy to clipboard operation
opencompass copied to clipboard

[Bug] Long text evaluation parameters are not clear

Open bullw opened this issue 10 months ago • 3 comments

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

python 3.10.1 OpenCompass 0.2.3 vllm 0.2.3

Reproduces the problem - code/configuration sample

configs/models/chatglm/vllm_chatglm2_6b_32k.py from opencompass.models import VLLM

models = [ dict( type=VLLM, abbr='chatglm2-6b-32k-vllm', path='THUDM/chatglm2-6b-32k', max_out_len=512, max_seq_len=4096, batch_size=32, generation_kwargs=dict(temperature=0), run_cfg=dict(num_gpus=1, num_procs=1), ) ]

Reproduces the problem - command or script

python run.py --model vllm_chatglm2_6b_32k --datasets longbench leval

Reproduces the problem - error message

The difference between the evaluation result parameters and the document long text evaluation is about 20 points, The score for the document can not be reproduced.

  1. “max_seq_len、max_out_len” Should these two parameters be modified in any way?

Other information

No response

bullw avatar Apr 10 '24 12:04 bullw

For optimal performance, it is advisable to configure the max_seq_len parameter to the highest value feasible, such as 32768 or even higher if possible. As for the max_out_len, it typically comes with a preset default value within the dataset configuration. You have the option to adjust this to 256, or you may simply retain the default setting.

liushz avatar Apr 10 '24 14:04 liushz

Thank you very much. I reproduced most of the scores.

I also need to ask, indicators for rouge1, rouge2,rougeL,rougeLsum subset of the score difference is still very large.

  1. What is the reason wow?
  2. What are the indicators used in the rank?

image

image

bullw avatar Apr 12 '24 03:04 bullw

@liushz

bullw avatar Apr 12 '24 03:04 bullw