opencompass
opencompass copied to clipboard
[Bug] math_gen数据集评估随机失败
先决条件
问题类型
我正在使用官方支持的任务/模型/数据集进行评估。
环境
{'CUDA available': True, 'CUDA_HOME': '/usr/local/cuda', 'GCC': 'gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0', 'GPU 0,1,2,3,4,5,6,7': 'NVIDIA A800-SXM4-80GB', 'MMEngine': '0.10.4', 'MUSA available': False, 'NVCC': 'Cuda compilation tools, release 11.7, V11.7.64', 'OpenCV': '4.9.0', 'PyTorch': '2.3.0+cu121', 'PyTorch compiling details': 'PyTorch built with:\n' ' - GCC 9.3\n' ' - C++ Version: 201703\n' ' - Intel(R) oneAPI Math Kernel Library Version ' '2022.2-Product Build 20220804 for Intel(R) 64 ' 'architecture applications\n' ' - Intel(R) MKL-DNN v3.3.6 (Git Hash ' '86e6af5974177e513fd3fee58425e1063e7f1361)\n' ' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n' ' - LAPACK is enabled (usually provided by ' 'MKL)\n' ' - NNPACK is enabled\n' ' - CPU capability usage: AVX512\n' ' - CUDA Runtime 12.1\n' ' - NVCC architecture flags: ' '-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n' ' - CuDNN 8.9.2\n' ' - Magma 2.6.1\n' ' - Build settings: BLAS_INFO=mkl, ' 'BUILD_TYPE=Release, CUDA_VERSION=12.1, ' 'CUDNN_VERSION=8.9.2, ' 'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, ' 'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 ' '-fabi-version=11 -fvisibility-inlines-hidden ' '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO ' '-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM ' '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK ' '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE ' '-O2 -fPIC -Wall -Wextra -Werror=return-type ' '-Werror=non-virtual-dtor -Werror=bool-operation ' '-Wnarrowing -Wno-missing-field-initializers ' '-Wno-type-limits -Wno-array-bounds ' '-Wno-unknown-pragmas -Wno-unused-parameter ' '-Wno-unused-function -Wno-unused-result ' '-Wno-strict-overflow -Wno-strict-aliasing ' '-Wno-stringop-overflow -Wsuggest-override ' '-Wno-psabi -Wno-error=pedantic ' '-Wno-error=old-style-cast -Wno-missing-braces ' '-fdiagnostics-color=always -faligned-new ' '-Wno-unused-but-set-variable ' '-Wno-maybe-uninitialized -fno-math-errno ' '-fno-trapping-math -Werror=format ' '-Wno-stringop-overflow, LAPACK_INFO=mkl, ' 'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, ' 'PERF_WITH_AVX512=1, TORCH_VERSION=2.3.0, ' 'USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, ' 'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, ' 'USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, ' 'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, ' 'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, ' 'USE_ROCM_KERNEL_ASSERT=OFF, \n', 'Python': '3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]', 'TorchVision': '0.18.0+cu121', 'numpy_random_seed': 2147483648, 'opencompass': '0.2.4+', 'sys.platform': 'linux'}
重现问题 - 代码/配置示例
math_gen数据集 跑 qwen1.5-1.8B官方模型
重现问题 - 命令或脚本
CUDA_VISIBLE_DEVICES=0 python run.py
--datasets math_gen
--hf-path local_Qwen1.5
--tokenizer-path local_Qwen1.5
--work-dir ./outputs/
--model-kwargs device_map='auto'
--tokenizer-kwargs padding_side='left' truncation='left' use_fast=False
--max-out-len 100
--max-seq-len 2048
--batch-size 8
--no-batch-padding
--num-gpus 1
重现问题 - 错误信息
opencompass/opencompass/runners/base.py - summarize - 64 - OpenICLInfer[opencompass.models.huggingface.HuggingFace_download_Qwen1.5-1.8B/math_24] failed with code 1
opencompass/opencompass/tasks/openicl_eval.py - _score - 239 - Task [opencompass.models.huggingface.HuggingFace_download_Qwen1.5-1.8B/math]: preds and refrs have different length
其他信息
我在基于opencompass评测qwen1.5官方未改动模型,math数据集会分成好几块,每一次跑的时候都会有不同的切块报错如下:本次就是math_24报错,之前还有math_6报错等
Have you changed your partition logic midway? If you run it all at once, this problem shouldn't occur
Have you changed your partition logic midway? If you run it all at once, this problem shouldn't occur
- I havenot made any changes
- to avoid unexpected bugs, i have also rm ~/.cache before running script
The error log for your eval stage is because there are some errors during your infer stage, so the length of prediction is different with refs, you can check the following log:
The error log for your eval stage is because there are some errors during your infer stage, so the length of prediction is different with refs, you can check the following log:
Hi liushz,
log is here, bug i cannot get the point
W0513 14:40:06.153000 139835420161856 torch/distributed/elastic/agent/server/api.py:741] Received Signals.SIGHUP death signal, shutting down workers
W0513 14:40:06.154000 139835420161856 torch/distributed/elastic/multiprocessing/api.py:851] Sending process 2678994 closing signal SIGHUP
Traceback (most recent call last):
File "/home/jovyan/anaconda3/envs/opencompass/bin/torchrun", line 8, in