fastllm Qwen3-Next-80B-A3B不支持工具调用

window系统，在openwebui中使用模型Qwen3-Next-80B-A3B-Instruct-UD-Q4_K_M，原生调用fetch mcp 工具失败，截图如下，终端错误如下

(.venv) PS D:\python_project\Fastllm_P> ftllm server E:\Model\Qwen3-Next-80B-A3B-Instruct-UD-Q4_K_M `

-t 8 --host 0.0.0.0 --port 8000 --api_key sk-001 --device cuda --moe_device "{'cuda':15,'cpu':85}" --kv_cache_limit 16G 2025-09-30 17:14:35,548 7032 server.py[line:159] INFO: Namespace(command='server', version=False, model='E:\Model\Qwen3-Next-80B-A3B-Instruct-UD-Q4_K_M', path='', threads=8, low=False, dtype='auto', moe_dtype='', atype='auto', cuda_embedding=False, kv_cache_limit='16G', max_batch=-1, device='cuda', moe_device="{'cuda':15,'cpu':85}", moe_experts=-1, cache_history='', cache_fast='', enable_thinking='', cuda_shared_expert='true', custom='', lora='', cache_dir='', dtype_config='', ori='', tool_call_parser='auto', chat_template='', model_name='', host='0.0.0.0', port=8000, api_key='sk-001', think='false', hide_input=False, dev_mode=False) CPU Instruction Info: [AVX512F: OFF] [AVX512_VNNI: OFF] [AVX512_BF16: OFF] Loading 100 Warmup... finish. INFO: Started server process [7032] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) auth INFO: 192.168.50.180:44452 - "GET /v1/models HTTP/1.1" 200 OK auth INFO: 192.168.50.180:38246 - "GET /v1/models HTTP/1.1" 200 OK auth 2025-09-30 17:19:22,456 7032 fastllm_completion.py[line:145] INFO: fastllm input message: [{'role': 'user', 'content': '读取https://ollama.com/library/gpt-oss，总结内容'}] 2025-09-30 17:19:22,868 7032 fastllm_completion.py[line:160] INFO: Created conversation: fastllm-E:\Model\Qwen3-Next-80B-A3B-Instruct-UD-Q4_K_M-c4daa98ac2d941fca3ab7af57e002df1, handle: 0 Fastllm KV Cache Limit: 16000.000000 MB. Fastllm KV Cache Token limit: 325520 tokens. Fastllm Prompt Token limit: 244140 tokens. Fastllm Batch limit: 512. INFO: 192.168.50.180:38256 - "POST /v1/chat/completions HTTP/1.1" 200 OK Auto tool parse detect type: hermes alive = 1, pending = 0, contextLen = 384, Speed: 0.135419 tokens / s. alive = 1, pending = 0, contextLen = 384, Speed: 14.564715 tokens / s. 2025-09-30 17:19:32,267 7032 fastllm_completion.py[line:374] INFO: Removed completed stream conversation from tracking: fastllm-E:\Model\Qwen3-Next-80B-A3B-Instruct-UD-Q4_K_M-c4daa98ac2d941fca3ab7af57e002df1 2025-09-30 17:19:32,269 7032 fastllm_completion.py[line:177] INFO: Abort request: fastllm-E:\Model\Qwen3-Next-80B-A3B-Instruct-UD-Q4_K_M-c4daa98ac2d941fca3ab7af57e002df1 auth INFO: 192.168.50.180:38800 - "GET /v1/models HTTP/1.1" 200 OK auth INFO: 192.168.50.180:38816 - "POST /v1/chat/completions HTTP/1.1" 422 Unprocessable Entity auth INFO: 192.168.50.180:38832 - "GET /v1/models HTTP/1.1" 200 OK

Sep 30 '25 09:09 Hunter6324

但是可以使用默认函数调用方式，是模型Qwen3-Next-80B-A3B不支持原生工具调用吗？

Sep 30 '25 09:09 Hunter6324

我用exllamav3 加载好像也不支持 tool call

Oct 15 '25 17:10 sorasoras