采用 Openai server 测试,能够产生 inference 结果,但是 eval 输出结果为空
问题描述 / Issue Description
请简要描述您遇到的问题。 / Please briefly describe the issue you encountered.
使用的工具 / Tools Used
- [ ] Native / 原生框架
- [✅] Opencompass backend
- [ ] VLMEvalKit backend
- [ ] RAGEval backend
- [ ] Perf / 模型推理压测工具
- [ ] Arena /竞技场模式
执行的代码或指令 / Code or Commands Executed
请提供您执行的主要代码或指令。 / Please provide the main code or commands you executed. 例如 / For example:
from evalscope.run import run_task
from evalscope.summarizer import Summarizer
task_cfg_dict = dict(
eval_backend='OpenCompass',
eval_config={
'datasets': ['gsm8k'],
'models': [
{'path': '/workspace/models/Llama-2-13b-chat-hf',
'openai_api_base': 'http://127.0.0.1:8008/v1/chat/completions',
'is_chat': True,
'batch_size': 16},
],
'work_dir': 'outputs/llama-2-13b-chat-hf',
'limit': None,
},
)
def run_eval():
# 选项 1: python 字典
task_cfg = task_cfg_dict
# 选项 2: yaml 配置文件
# task_cfg = 'eval_openai_api.yaml'
# 选项 3: json 配置文件
# task_cfg = 'eval_openai_api.json'
# print(task_cfg)
run_task(task_cfg=task_cfg)
print('>> Start to get the report with summarizer ...')
report_list = Summarizer.get_report_from_cfg(task_cfg)
print(f'\n>> The report list: {report_list}')
run_eval()
# 例如:在终端中执行的指令 / Terminal command executed
python script.py
错误日志 / Error Log
请粘贴完整的错误日志或控制台输出。 / Please paste the full error log or console output. 例如 / For example:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
dataset version metric mode /workspace/models/Llama-2-13b-chat-hf
-------------------------------------- --------- -------- ------ ---------------------------------------
--------- 考试 Exam --------- - - - -
ceval - - - -
agieval - - - -
mmlu - - - -
GaokaoBench - - - -
ARC-c - - - -
--------- 语言 Language --------- - - - -
WiC - - - -
summedits - - - -
chid-dev - - - -
afqmc-dev - - - -
bustm-dev - - - -
cluewsc-dev - - - -
WSC - - - -
winogrande - - - -
flores_100 - - - -
--------- 知识 Knowledge --------- - - - -
BoolQ - - - -
commonsense_qa - - - -
nq - - - -
triviaqa - - - -
--------- 推理 Reasoning --------- - - - -
cmnli - - - -
ocnli - - - -
ocnli_fc-dev - - - -
AX_b - - - -
AX_g - - - -
CB - - - -
RTE - - - -
story_cloze - - - -
COPA - - - -
ReCoRD - - - -
hellaswag - - - -
piqa - - - -
siqa - - - -
strategyqa - - - -
math - - - -
gsm8k - - - -
TheoremQA - - - -
openai_humaneval - - - -
mbpp - - - -
bbh - - - -
--------- 理解 Understanding --------- - - - -
C3 - - - -
CMRC_dev - - - -
DRCD_dev - - - -
MultiRC - - - -
race-middle - - - -
race-high - - - -
openbookqa_fact - - - -
csl_dev - - - -
lcsts - - - -
Xsum - - - -
eprstmt-dev - - - -
lambada - - - -
tnews-dev - - - -
运行环境 / Runtime Environment
-
操作系统 / Operating System:
- [ ] Windows
- [ ] macOS
- [✅] Ubuntu
-
Python版本 / Python Version:
- [ ] 3.11
- [ ✅] 3.10
- [ ] 3.9
其他信息 / Additional Information
如果有其他相关信息,请在此处提供。 / If there is any other relevant information, please provide it here.
下面是模型路径下的 out 文件 `11/21 15:20:16 - OpenCompass - INFO - Task [/workspace/models/Llama-2-13b-chat-hf/gsm8k] 11/21 15:20:24 - OpenCompass - WARNING - Max Completion tokens for /workspace/models/Llama-2-13b-chat-hf is :16384 11/21 15:20:26 - OpenCompass - INFO - Start inferencing [/workspace/models/Llama-2-13b-chat-hf/gsm8k] [2024-11-21 15:20:27,222] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader [2024-11-21 15:20:27,222] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
0%| | 0/83 [00:00<?, ?it/s] 1%| | 1/83 [00:12<17:28, 12.78s/it] 2%|▏ | 2/83 [00:24<16:26, 12.18s/it] 4%|▎ | 3/83 [00:36<16:00, 12.01s/it] 5%|▍ | 4/83 [00:48<15:58, 12.14s/it] 6%|▌ | 5/83 [01:01<16:10, 12.44s/it] 7%|▋ | 6/83 [01:14<16:03, 12.51s/it] 8%|▊ | 7/83 [01:26<15:44, 12.43s/it] 10%|▉ | 8/83 [01:38<15:17, 12.24s/it] 11%|█ | 9/83 [01:50<15:09, 12.29s/it] 12%|█▏ | 10/83 [02:03<15:08, 12.45s/it] 13%|█▎ | 11/83 [02:16<15:01, 12.53s/it] 14%|█▍ | 12/83 [02:28<14:42, 12.43s/it] 16%|█▌ | 13/83 [02:41<14:47, 12.68s/it] 17%|█▋ | 14/83 [02:53<14:09, 12.31s/it] 18%|█▊ | 15/83 [03:05<13:47, 12.17s/it] 19%|█▉ | 16/83 [03:17<13:36, 12.19s/it] 20%|██ | 17/83 [03:29<13:24, 12.19s/it] 22%|██▏ | 18/83 [03:43<13:41, 12.64s/it] 23%|██▎ | 19/83 [03:56<13:32, 12.70s/it] 24%|██▍ | 20/83 [04:08<13:07, 12.50s/it] 25%|██▌ | 21/83 [04:20<12:49, 12.41s/it] 27%|██▋ | 22/83 [04:32<12:31, 12.32s/it] 28%|██▊ | 23/83 [04:44<12:10, 12.18s/it] 29%|██▉ | 24/83 [04:56<12:05, 12.29s/it] 30%|███ | 25/83 [05:10<12:09, 12.59s/it] 31%|███▏ | 26/83 [05:22<12:00, 12.65s/it] 33%|███▎ | 27/83 [05:36<12:00, 12.86s/it] 34%|███▎ | 28/83 [05:49<11:48, 12.87s/it] 35%|███▍ | 29/83 [06:02<11:40, 12.97s/it] 36%|███▌ | 30/83 [06:15<11:24, 12.91s/it] 37%|███▋ | 31/83 [06:27<11:05, 12.79s/it] 39%|███▊ | 32/83 [06:39<10:38, 12.52s/it] 40%|███▉ | 33/83 [06:51<10:21, 12.44s/it] 41%|████ | 34/83 [07:04<10:15, 12.56s/it] 42%|████▏ | 35/83 [07:17<10:06, 12.64s/it] 43%|████▎ | 36/83 [07:29<09:42, 12.38s/it] 45%|████▍ | 37/83 [07:40<09:18, 12.15s/it] 46%|████▌ | 38/83 [07:53<09:16, 12.37s/it] 47%|████▋ | 39/83 [08:06<09:11, 12.52s/it] 48%|████▊ | 40/83 [08:18<08:53, 12.41s/it] 49%|████▉ | 41/83 [08:31<08:42, 12.43s/it] 51%|█████ | 42/83 [08:44<08:38, 12.64s/it] 52%|█████▏ | 43/83 [08:55<08:14, 12.35s/it] 53%|█████▎ | 44/83 [09:08<08:06, 12.47s/it] 54%|█████▍ | 45/83 [09:20<07:47, 12.30s/it] 55%|█████▌ | 46/83 [09:33<07:43, 12.51s/it] 57%|█████▋ | 47/83 [09:46<07:31, 12.54s/it] 58%|█████▊ | 48/83 [09:59<07:24, 12.71s/it] 59%|█████▉ | 49/83 [10:11<07:03, 12.45s/it] 60%|██████ | 50/83 [10:23<06:53, 12.54s/it] 61%|██████▏ | 51/83 [10:37<06:46, 12.71s/it] 63%|██████▎ | 52/83 [10:49<06:28, 12.52s/it] 64%|██████▍ | 53/83 [11:02<06:22, 12.73s/it] 65%|██████▌ | 54/83 [11:14<06:04, 12.57s/it] 66%|██████▋ | 55/83 [11:26<05:44, 12.31s/it] 67%|██████▋ | 56/83 [11:38<05:35, 12.42s/it] 69%|██████▊ | 57/83 [11:50<05:18, 12.27s/it] 70%|██████▉ | 58/83 [12:03<05:08, 12.33s/it] 71%|███████ | 59/83 [12:16<05:01, 12.56s/it] 72%|███████▏ | 60/83 [12:28<04:44, 12.36s/it] 73%|███████▎ | 61/83 [12:40<04:30, 12.30s/it] 75%|███████▍ | 62/83 [12:52<04:16, 12.21s/it] 76%|███████▌ | 63/83 [13:05<04:07, 12.39s/it] 77%|███████▋ | 64/83 [13:17<03:56, 12.47s/it] 78%|███████▊ | 65/83 [13:30<03:46, 12.59s/it] 80%|███████▉ | 66/83 [13:42<03:28, 12.29s/it] 81%|████████ | 67/83 [13:54<03:16, 12.26s/it] 82%|████████▏ | 68/83 [14:07<03:06, 12.46s/it] 83%|████████▎ | 69/83 [14:20<02:55, 12.51s/it] 84%|████████▍ | 70/83 [14:32<02:42, 12.53s/it] 86%|████████▌ | 71/83 [14:44<02:29, 12.43s/it] 87%|████████▋ | 72/83 [14:57<02:17, 12.53s/it] 88%|████████▊ | 73/83 [15:09<02:03, 12.39s/it] 89%|████████▉ | 74/83 [15:21<01:51, 12.34s/it] 90%|█████████ | 75/83 [15:35<01:40, 12.59s/it] 92%|█████████▏| 76/83 [15:47<01:28, 12.65s/it] 93%|█████████▎| 77/83 [16:01<01:16, 12.78s/it] 94%|█████████▍| 78/83 [16:14<01:04, 12.99s/it] 95%|█████████▌| 79/83 [16:27<00:52, 13.05s/it] 96%|█████████▋| 80/83 [16:40<00:39, 13.01s/it] 98%|█████████▊| 81/83 [16:52<00:25, 12.69s/it] 99%|█████████▉| 82/83 [17:04<00:12, 12.59s/it] 100%|██████████| 83/83 [17:12<00:00, 11.18s/it] 100%|██████████| 83/83 [17:12<00:00, 12.44s/it] 11/21 15:37:40 - OpenCompass - INFO - time elapsed: 1043.18s `
请执行pip list | grep ms-opencompass看一下opencomapss版本
感谢你的反馈!我们将关闭此问题。如果您有任何疑问,请随时重新打开它。如果EvalScope对您有所帮助,欢迎给我们点个STAR以示支持,谢谢!