没有数据集tool_bench
自查清单
在提交 issue 之前,请确保您已完成以下步骤:
问题描述
Exception: Unknown benchmark: tool_bench. Available tasks: ['drop', 'humaneval', 'mmlu_redux', 'hpdv2', 'genai_bench', 'evalmuse', 'general_t2i', 'tifa160', 'gsm8k', 'general_qa', 'gpqa', 'competition_math', 'arena_hard', 'super_gpqa', 'simple_qa', 'mmlu', 'winogrande', 'cmmlu', 'live_code_bench', 'math_500', 'maritime_bench', 'trivia_qa', 'bbh', 'ceval', 'process_bench', 'hellaswag', 'race', 'truthful_qa', 'iquiz', 'ifeval', 'chinese_simpleqa', 'musr', 'alpaca_eval', 'arc', 'general_mcq', 'data_collection', 'mmlu_pro', 'aime25', 'aime24']
EvalScope 版本(必填)
v2.0.0
使用的工具
- [ ] Native / 原生框架
- [ ] Opencompass backend
- [ ] VLMEvalKit backend
- [ ] RAGEval backend
- [ ] Perf / 模型推理压测工具
- [ ] Arena / 竞技场模式
执行的代码或指令
(evalscope) zengxiangxi@xianshitest3day3:~/project/lvyouLLM/Fine_tuning/evl$ python evl_tool.py
错误日志
(evalscope) zengxiangxi@xianshitest3day3:~/project/lvyouLLM/Fine_tuning/evl$ python evl_tool.py
2025-05-26 11:16:36,720 - evalscope - INFO - Args: Task config is provided with TaskConfig type.
2025-05-26 11:16:38,700 - evalscope - INFO - Loading model /data/home/zengxiangxi/project/lvyouLLM/Fine_tuning/merged_qwen3_0.6b_lora_tourism3 ...
Traceback (most recent call last):
File "/data/home/zengxiangxi/project/lvyouLLM/Fine_tuning/evl/evl_tool.py", line 17, in
运行环境
from evalscope import TaskConfig, run_task
task_cfg = TaskConfig( model='/data/home/zengxiangxi/project/lvyouLLM/Fine_tuning/merged_qwen3_0.6b_lora_tourism3', datasets=['tool_bench'], limit=5, eval_batch_size=5, generation_config={ 'max_new_tokens': 512, # Maximum number of tokens to generate, set to a large value to avoid truncation 'temperature': 0.7, # Sampling temperature (recommended value by qwen) 'top_p': 0.8, # Top-p sampling (recommended value by qwen) 'top_k': 20, # Top-k sampling (recommended value by qwen) 'chat_template_kwargs': {'enable_thinking': False} # Disable thinking mode } )
run_task(task_cfg=task_cfg)
- 操作系统:
- Python版本:3.10
其他信息
如果有其他相关信息,请在此处提供。