opencompass icon indicating copy to clipboard operation
opencompass copied to clipboard

[Bug] Different result on mmlu between opencompass and lm-evaluation-harness

Open kirliavc opened this issue 1 year ago • 2 comments

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

{'CUDA available': True, 'CUDA_HOME': '/usr/local/cuda-11.3', 'GCC': 'gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0', 'GPU 0,1,2,3,4,5,6,7': 'NVIDIA A100-PCIE-40GB', 'MMEngine': '0.8.5', 'NVCC': 'Cuda compilation tools, release 11.3, V11.3.58', 'OpenCV': '4.8.1', 'PyTorch': '2.1.0', 'PyTorch compiling details': 'PyTorch built with:\n' ' - GCC 9.3\n' ' - C++ Version: 201703\n' ' - Intel(R) oneAPI Math Kernel Library Version ' '2023.1-Product Build 20230303 for Intel(R) 64 ' 'architecture applications\n' ' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n' ' - LAPACK is enabled (usually provided by ' 'MKL)\n' ' - NNPACK is enabled\n' ' - CPU capability usage: AVX512\n' ' - CUDA Runtime 11.8\n' ' - NVCC architecture flags: ' '-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37\n' ' - CuDNN 8.7\n' ' - Magma 2.6.1\n' ' - Build settings: BLAS_INFO=mkl, ' 'BUILD_TYPE=Release, CUDA_VERSION=11.8, ' 'CUDNN_VERSION=8.7.0, ' 'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, ' 'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 ' '-fabi-version=11 -fvisibility-inlines-hidden ' '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO ' '-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM ' '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK ' '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE ' '-O2 -fPIC -Wall -Wextra -Werror=return-type ' '-Werror=non-virtual-dtor -Werror=bool-operation ' '-Wnarrowing -Wno-missing-field-initializers ' '-Wno-type-limits -Wno-array-bounds ' '-Wno-unknown-pragmas -Wno-unused-parameter ' '-Wno-unused-function -Wno-unused-result ' '-Wno-strict-overflow -Wno-strict-aliasing ' '-Wno-stringop-overflow -Wno-psabi ' '-Wno-error=pedantic -Wno-error=old-style-cast ' '-Wno-invalid-partial-specialization ' '-Wno-unused-private-field ' '-Wno-aligned-allocation-unavailable ' '-Wno-missing-braces -fdiagnostics-color=always ' '-faligned-new -Wno-unused-but-set-variable ' '-Wno-maybe-uninitialized -fno-math-errno ' '-fno-trapping-math -Werror=format ' '-Werror=cast-function-type ' '-Wno-stringop-overflow, LAPACK_INFO=mkl, ' 'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, ' 'PERF_WITH_AVX512=1, ' 'TORCH_DISABLE_GPU_ASSERTS=ON, ' 'TORCH_VERSION=2.1.0, USE_CUDA=ON, USE_CUDNN=ON, ' 'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, ' 'USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, ' 'USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, ' 'USE_OPENMP=ON, USE_ROCM=OFF, \n', 'Python': '3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]', 'TorchVision': '0.16.0', 'numpy_random_seed': 2147483648, 'opencompass': '0.1.5+3326c54', 'sys.platform': 'linux'}

Reproduces the problem - code/configuration sample

None

Reproduces the problem - command or script

opencompass:

python run.py --datasets mmlu_ppl \
--hf-path /path/to/llama-7b \
--model-kwargs device_map='auto' \
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True \
--max-out-len 100 \
--max-seq-len 2048 \
--batch-size 8 \
--no-batch-padding \
--max-partition-size 4000 \
--num-gpus 1

lm-eval:

python main.py \
    --model hf-causal-experimental \
    --model_args pretrained=/path/to/llama-7b,use_accelerate=True,trust_remote_code=True,dtype="float16" \
    --tasks "hendrycksTest-abstract_algebra" \
    --batch_size 4 \
    --device auto \
    --output_path results/llama-7b/mmlu_abstract_algebra.json \
    --write_out \
    --output_base_path ./outputs \
    --num_fewshot 5 \
--no_cache

Reproduces the problem - error message

The results of opencompass and lm-eval are quite different. I find that the input prompt of two frameworks are totally same, but the results (choice answer) are different. In the abstract_algebra benchmark, lm-eval got 34/100 points, while opencompass only got 25/100 points.

lm-evaluation-harness result:

"results": {
    "hendrycksTest-abstract_algebra": {
      "acc": 0.34,
      "acc_stderr": 0.047609522856952365,
      "acc_norm": 0.34,
      "acc_norm_stderr": 0.047609522856952365
    }
  },

opencompass result:

lukaemon_mmlu_abstract_algebra,9d6923,accuracy,ppl,25.00

Other information

Here is the detailed input and output of one specific question which was solved by correctly by lm-eval but wrong in opencompass. The prompt of two freamworks are exactly same, but the results are different. lm_eval:

{
        "doc_id": 62,
        "prompt_0": "The following are multiple choice questions (with answers) about abstract algebra.\n\nFind all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.\nA. 0\nB. 1\nC. 2\nD. 3\nAnswer: B\n\nStatement 1 | If aH is an element of a factor group, then |aH| divides |a|. Statement 2 | If H and K are subgroups of G then HK is a subgroup of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: B\n\nStatement 1 | Every element of a group generates a cyclic subgroup of the group. Statement 2 | The symmetric group S_10 has 10 elements.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: C\n\nStatement 1| Every function from a finite set onto itself must be one to one. Statement 2 | Every subgroup of an abelian group is abelian.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: A\n\nFind the characteristic of the ring 2Z.\nA. 0\nB. 3\nC. 12\nD. 30\nAnswer: A\n\nStatement 1 | Every homomorphic image of a group G is isomorphic to a factor group of G. Statement 2 | The homomorphic images of a group G are the same (up to isomorphism) as the factor groups of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: A",
        "prompt_1": "The following are multiple choice questions (with answers) about abstract algebra.\n\nFind all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.\nA. 0\nB. 1\nC. 2\nD. 3\nAnswer: B\n\nStatement 1 | If aH is an element of a factor group, then |aH| divides |a|. Statement 2 | If H and K are subgroups of G then HK is a subgroup of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: B\n\nStatement 1 | Every element of a group generates a cyclic subgroup of the group. Statement 2 | The symmetric group S_10 has 10 elements.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: C\n\nStatement 1| Every function from a finite set onto itself must be one to one. Statement 2 | Every subgroup of an abelian group is abelian.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: A\n\nFind the characteristic of the ring 2Z.\nA. 0\nB. 3\nC. 12\nD. 30\nAnswer: A\n\nStatement 1 | Every homomorphic image of a group G is isomorphic to a factor group of G. Statement 2 | The homomorphic images of a group G are the same (up to isomorphism) as the factor groups of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: B",
        "prompt_2": "The following are multiple choice questions (with answers) about abstract algebra.\n\nFind all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.\nA. 0\nB. 1\nC. 2\nD. 3\nAnswer: B\n\nStatement 1 | If aH is an element of a factor group, then |aH| divides |a|. Statement 2 | If H and K are subgroups of G then HK is a subgroup of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: B\n\nStatement 1 | Every element of a group generates a cyclic subgroup of the group. Statement 2 | The symmetric group S_10 has 10 elements.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: C\n\nStatement 1| Every function from a finite set onto itself must be one to one. Statement 2 | Every subgroup of an abelian group is abelian.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: A\n\nFind the characteristic of the ring 2Z.\nA. 0\nB. 3\nC. 12\nD. 30\nAnswer: A\n\nStatement 1 | Every homomorphic image of a group G is isomorphic to a factor group of G. Statement 2 | The homomorphic images of a group G are the same (up to isomorphism) as the factor groups of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: C",
        "prompt_3": "The following are multiple choice questions (with answers) about abstract algebra.\n\nFind all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.\nA. 0\nB. 1\nC. 2\nD. 3\nAnswer: B\n\nStatement 1 | If aH is an element of a factor group, then |aH| divides |a|. Statement 2 | If H and K are subgroups of G then HK is a subgroup of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: B\n\nStatement 1 | Every element of a group generates a cyclic subgroup of the group. Statement 2 | The symmetric group S_10 has 10 elements.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: C\n\nStatement 1| Every function from a finite set onto itself must be one to one. Statement 2 | Every subgroup of an abelian group is abelian.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: A\n\nFind the characteristic of the ring 2Z.\nA. 0\nB. 3\nC. 12\nD. 30\nAnswer: A\n\nStatement 1 | Every homomorphic image of a group G is isomorphic to a factor group of G. Statement 2 | The homomorphic images of a group G are the same (up to isomorphism) as the factor groups of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: D",
        "logit_0": -1.2657661437988281,
        "truth": 0,
        "logit_1": -1.2786054611206055,
        "logit_2": -1.3835443258285522,
        "logit_3": -1.7316675186157227,
        "acc": "1.0",
        "acc_norm": "1.0"
    },

opencompass:

"6": {
        "in-context examples": "Find all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.\nA. 0\nB. 1\nC. 2\nD. 3\nAnswer: B\n\nStatement 1 | If aH is an element of a factor group, then |aH| divides |a|. Statement 2 | If H and K are subgroups of G then HK is a subgroup of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: B\n\nStatement 1 | Every element of a group generates a cyclic subgroup of the group. Statement 2 | The symmetric group S_10 has 10 elements.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: C\n\nStatement 1| Every function from a finite set onto itself must be one to one. Statement 2 | Every subgroup of an abelian group is abelian.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: A\n\nFind the characteristic of the ring 2Z.\nA. 0\nB. 3\nC. 12\nD. 30\nAnswer: A\n\n",
        "label: A": {
            "testing input": "The following are multiple choice questions (with answers) about abstract algebra.\n\nStatement 1 | Every homomorphic image of a group G is isomorphic to a factor group of G. Statement 2 | The homomorphic images of a group G are the same (up to isomorphism) as the factor groups of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: A",
            "prompt": "The following are multiple choice questions (with answers) about abstract algebra.\n\nFind all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.\nA. 0\nB. 1\nC. 2\nD. 3\nAnswer: B\n\nStatement 1 | If aH is an element of a factor group, then |aH| divides |a|. Statement 2 | If H and K are subgroups of G then HK is a subgroup of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: B\n\nStatement 1 | Every element of a group generates a cyclic subgroup of the group. Statement 2 | The symmetric group S_10 has 10 elements.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: C\n\nStatement 1| Every function from a finite set onto itself must be one to one. Statement 2 | Every subgroup of an abelian group is abelian.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: A\n\nFind the characteristic of the ring 2Z.\nA. 0\nB. 3\nC. 12\nD. 30\nAnswer: A\n\nStatement 1 | Every homomorphic image of a group G is isomorphic to a factor group of G. Statement 2 | The homomorphic images of a group G are the same (up to isomorphism) as the factor groups of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: A",
            "PPL": 1.0728914677189085
        },
        "label: B": {
            "testing input": "The following are multiple choice questions (with answers) about abstract algebra.\n\nStatement 1 | Every homomorphic image of a group G is isomorphic to a factor group of G. Statement 2 | The homomorphic images of a group G are the same (up to isomorphism) as the factor groups of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: B",
            "prompt": "The following are multiple choice questions (with answers) about abstract algebra.\n\nFind all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.\nA. 0\nB. 1\nC. 2\nD. 3\nAnswer: B\n\nStatement 1 | If aH is an element of a factor group, then |aH| divides |a|. Statement 2 | If H and K are subgroups of G then HK is a subgroup of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: B\n\nStatement 1 | Every element of a group generates a cyclic subgroup of the group. Statement 2 | The symmetric group S_10 has 10 elements.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: C\n\nStatement 1| Every function from a finite set onto itself must be one to one. Statement 2 | Every subgroup of an abelian group is abelian.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: A\n\nFind the characteristic of the ring 2Z.\nA. 0\nB. 3\nC. 12\nD. 30\nAnswer: A\n\nStatement 1 | Every homomorphic image of a group G is isomorphic to a factor group of G. Statement 2 | The homomorphic images of a group G are the same (up to isomorphism) as the factor groups of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: B",
            "PPL": 1.0727724956376903
        },
        "label: C": {
            "testing input": "The following are multiple choice questions (with answers) about abstract algebra.\n\nStatement 1 | Every homomorphic image of a group G is isomorphic to a factor group of G. Statement 2 | The homomorphic images of a group G are the same (up to isomorphism) as the factor groups of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: C",
            "prompt": "The following are multiple choice questions (with answers) about abstract algebra.\n\nFind all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.\nA. 0\nB. 1\nC. 2\nD. 3\nAnswer: B\n\nStatement 1 | If aH is an element of a factor group, then |aH| divides |a|. Statement 2 | If H and K are subgroups of G then HK is a subgroup of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: B\n\nStatement 1 | Every element of a group generates a cyclic subgroup of the group. Statement 2 | The symmetric group S_10 has 10 elements.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: C\n\nStatement 1| Every function from a finite set onto itself must be one to one. Statement 2 | Every subgroup of an abelian group is abelian.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: A\n\nFind the characteristic of the ring 2Z.\nA. 0\nB. 3\nC. 12\nD. 30\nAnswer: A\n\nStatement 1 | Every homomorphic image of a group G is isomorphic to a factor group of G. Statement 2 | The homomorphic images of a group G are the same (up to isomorphism) as the factor groups of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: C",
            "PPL": 1.073089754520939
        },
        "label: D": {
            "testing input": "The following are multiple choice questions (with answers) about abstract algebra.\n\nStatement 1 | Every homomorphic image of a group G is isomorphic to a factor group of G. Statement 2 | The homomorphic images of a group G are the same (up to isomorphism) as the factor groups of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: D",
            "prompt": "The following are multiple choice questions (with answers) about abstract algebra.\n\nFind all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.\nA. 0\nB. 1\nC. 2\nD. 3\nAnswer: B\n\nStatement 1 | If aH is an element of a factor group, then |aH| divides |a|. Statement 2 | If H and K are subgroups of G then HK is a subgroup of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: B\n\nStatement 1 | Every element of a group generates a cyclic subgroup of the group. Statement 2 | The symmetric group S_10 has 10 elements.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: C\n\nStatement 1| Every function from a finite set onto itself must be one to one. Statement 2 | Every subgroup of an abelian group is abelian.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: A\n\nFind the characteristic of the ring 2Z.\nA. 0\nB. 3\nC. 12\nD. 30\nAnswer: A\n\nStatement 1 | Every homomorphic image of a group G is isomorphic to a factor group of G. Statement 2 | The homomorphic images of a group G are the same (up to isomorphism) as the factor groups of G.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: D",
            "PPL": 1.073962216449873
        },
        "prediction": "B",
        "gold": "A"
    },

kirliavc avatar Oct 10 '23 06:10 kirliavc

maybe few-shot examples are different? what about all 0-shot?

RanchiZhao avatar Oct 12 '23 02:10 RanchiZhao

The problem is probably due to a special token added in opencompass, but lm-eval disabled the special token. lm-eval says,

WARNING: Evaluating causal models with `add_special_tokens=True` is
                currently __not__ supported.

kirliavc avatar Oct 12 '23 02:10 kirliavc

Feel free to re-open if needed.

tonysy avatar Feb 19 '24 05:02 tonysy