opencompass
opencompass copied to clipboard
UnboundLocalError: local variable 'prompt_token_num' referenced before assignment and NO OUTPUTS
先决条件
问题类型
我正在使用官方支持的任务/模型/数据集进行评估。
环境
{'CUDA available': True,
'CUDA_HOME': '/usr/local/cuda-12.2',
'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.3) 9.4.0',
'GPU 0,1,2,3,4,5,6,7': 'NVIDIA L40',
'MMEngine': '0.10.2',
'NVCC': 'Cuda compilation tools, release 12.2, V12.2.91',
'OpenCV': '4.9.0',
'PyTorch': '2.1.2+cu121',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2022.2-Product Build 20220804 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.1.1 (Git Hash '
'64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX512\n'
' - CUDA Runtime 12.1\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
' - CuDNN 8.9.6 (built against CUDA 12.2)\n'
' - Built with CuDNN 8.9.2\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=12.1, '
'CUDNN_VERSION=8.9.2, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM '
'-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-unused-function -Wno-unused-result '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wno-psabi '
'-Wno-error=pedantic -Wno-error=old-style-cast '
'-Wno-invalid-partial-specialization '
'-Wno-unused-private-field '
'-Wno-aligned-allocation-unavailable '
'-Wno-missing-braces -fdiagnostics-color=always '
'-faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Werror=cast-function-type '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'PERF_WITH_AVX512=1, '
'TORCH_DISABLE_GPU_ASSERTS=ON, '
'TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, '
'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
'USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, '
'USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, '
'USE_OPENMP=ON, USE_ROCM=OFF, \n',
'Python': '3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]',
'TorchVision': '0.16.2+cu121',
'numpy_random_seed': 2147483648,
'opencompass': '0.2.1+',
'sys.platform': 'linux'}
重现问题 - 代码/配置示例
skip
重现问题 - 命令或脚本
CUDA_VISIBLE_DEVICES="0,1,2,3" python run.py \
--hf-path /data1/xxx/swift-qlora-adapter/yi-34b-v13-0102-r64-data3/Yi-34B-xxx-merged2/v0-20240102-091935/checkpoint-37195-merged/ \
--tokenizer-path /data1/xxx/swift-qlora-adapter/yi-34b-v13-0102-r64-data3/Yi-34B-xxx-merged2/v0-20240102-091935/checkpoint-37195-merged/ \
--datasets cmmlu_ppl \
--model-kwargs device_map='auto' trust_remote_code=True \
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True \
--batch-size 2 \
--work-dir ./yi-34b-merged3/cmmlu_ppl/ \
--num-gpus 4 \
重现问题 - 错误信息
In /home/xxx/llm_projects/opencompass-main/yi-34b-merged3/cmmlu_ppl/20240112_183510/infer, there are two .out files:
in which, the error is
01/12 18:35:19 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-professional_law,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-jurisprudence,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-professional_medicine,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-chinese_history,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-college_medicine,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-elementary_chinese,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-elementary_information_and_technology,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-clinical_knowledge,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-professional_psychology,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-elementary_mathematics,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-sociology,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-legal_and_moral_basis,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-management,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-business_ethics,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-chinese_literature,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-computer_science,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-elementary_commonsense,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-marxist_theory,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-international_law,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-traditional_chinese_medicine,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-marketing,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-chinese_teacher_qualification,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-genetics,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-professional_accounting,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-public_relations,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-electrical_engineering,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-journalism,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-computer_security,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-agronomy,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-high_school_biology,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-virology,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-astronomy,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-sports_science,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-ancient_chinese,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-high_school_mathematics,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-education,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-world_history,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-arts,opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-chinese_civil_service_exam]
Loading checkpoint shards: 0%| | 0/15 [00:00<?, ?it/s]
Loading checkpoint shards: 7%|▋ | 1/15 [00:06<01:33, 6.71s/it]
Loading checkpoint shards: 13%|█▎ | 2/15 [00:12<01:23, 6.44s/it]
Loading checkpoint shards: 20%|██ | 3/15 [00:19<01:17, 6.45s/it]
Loading checkpoint shards: 27%|██▋ | 4/15 [00:25<01:09, 6.33s/it]
Loading checkpoint shards: 33%|███▎ | 5/15 [00:31<01:02, 6.25s/it]
Loading checkpoint shards: 40%|████ | 6/15 [00:38<00:56, 6.30s/it]
Loading checkpoint shards: 47%|████▋ | 7/15 [00:44<00:49, 6.22s/it]
Loading checkpoint shards: 53%|█████▎ | 8/15 [00:50<00:43, 6.17s/it]
Loading checkpoint shards: 60%|██████ | 9/15 [00:56<00:37, 6.23s/it]
Loading checkpoint shards: 67%|██████▋ | 10/15 [01:02<00:30, 6.17s/it]
Loading checkpoint shards: 73%|███████▎ | 11/15 [01:08<00:24, 6.14s/it]
Loading checkpoint shards: 80%|████████ | 12/15 [01:15<00:18, 6.21s/it]
Loading checkpoint shards: 87%|████████▋ | 13/15 [01:21<00:12, 6.15s/it]
Loading checkpoint shards: 93%|█████████▎| 14/15 [01:27<00:06, 6.12s/it]
Loading checkpoint shards: 100%|██████████| 15/15 [01:28<00:00, 4.78s/it]
Loading checkpoint shards: 100%|██████████| 15/15 [01:28<00:00, 5.92s/it]
01/12 18:36:51 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-professional_law]
0%| | 0/211 [00:00<?, ?it/s]
100%|██████████| 211/211 [00:00<00:00, 2920785.95it/s]
Traceback (most recent call last):
File "/home/xxx/llm_projects/opencompass-main/opencompass/tasks/openicl_infer.py", line 150, in <module>
inferencer.run()
File "/home/xxx/llm_projects/opencompass-main/opencompass/tasks/openicl_infer.py", line 80, in run
self._inference()
File "/home/xxx/llm_projects/opencompass-main/opencompass/tasks/openicl_infer.py", line 128, in _inference
inferencer.inference(retriever,
File "/home/xxx/llm_projects/opencompass-main/opencompass/openicl/icl_inferencer/icl_ppl_inferencer.py", line 148, in inference
token_num_list.append(prompt_token_num)
UnboundLocalError: local variable 'prompt_token_num' referenced before assignment
[2024-01-12 18:36:57,233] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1867675) of binary: /home/xxx/anaconda3/envs/opencompass/bin/python
Traceback (most recent call last):
File "/home/xxx/anaconda3/envs/opencompass/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/home/xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/home/xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/home/xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/home/xxx/llm_projects/opencompass-main/opencompass/tasks/openicl_infer.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-01-12_18:36:57
host : vision12.local
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1867675)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
In the eval file,
the error is
01/12 18:40:45 - OpenCompass - ERROR - /home/xxx/llm_projects/opencompass-main/opencompass/tasks/openicl_eval.py - _score - 236 - Task [opencompass.models.huggingface.HuggingFace_v0-20240102-091935_checkpoint-37195-merged/cmmlu-agronomy]: No predictions found.
01/12 18:40:45 - OpenCompass - INFO - time elapsed: 1.67s
and the outputs are all NONE,
cmmlu-professional_psychology - - - -
cmmlu-public_relations - - - -
cmmlu-security_study - - - -
cmmlu-sociology - - - -
cmmlu-sports_science - - - -
cmmlu-traditional_chinese_medicine - - - -
cmmlu-virology - - - -
其他信息
The model is my own finetuned Yi-34B
How about using official Yi?
Strangely, this problem could be fixed by adding --max-out-len
and --max-seq-len
in the eval script. After setting up this two parameters, the UnboundLocalError can be solved and the outputs are properly generated.
BUT Why?