FastGPT xinference、glm4工具调用报错400

例行检查

[x] 我已确认目前没有类似 issue
[x] 我已完整查看过项目 README，以及项目文档
[x] 我使用了自己的 key，并确认我的 key 是可正常使用的
[x] 我理解并愿意跟进此 issue，协助测试和提供反馈
[x] 我理解并认可上述内容，并理解项目维护者精力有限，不遵循规则的 issue 可能会被无视或直接关闭

你的版本

[x] 私有部署版本, 具体版本号: 4.8.4-fix

问题描述, 日志截图

xinference部署glm-4-9b，通过oneapi接入fastgpt，使用glm4的对话功能正常，使用glm4的工具调用时，报错400

版本信息：

xinference:0.12.2 fastgpt:4.8.4-fix oneapi:0.6.6 glm4:glm-4-9b-chat

使用glm4的对话功能正常

使用glm4的工具调用时，报错400

config.json

 {
      "model": "glm-4-9b",
      "name": "glm-4-9b",
      "maxContext": 8192,
      "avatar": "/imgs/model/chatglm.svg",
      "maxResponse": 3000,
      "quoteMaxToken": 6000,
      "maxTemperature": 1.2,
      "charsPointsPrice": 0,
      "censor": false,
      "vision": false,
      "datasetProcess": false,
      "usedInClassify": true,
      "usedInExtractFields": true,
      "usedInToolCall": true,
      "usedInQueryExtension": true,
      "toolChoice": true,
      "functionCall": true,
      "customCQPrompt": "",
      "customExtractPrompt": "",
      "defaultSystemChatPrompt": "",
      "defaultConfig": {}
    },

oneapi报错日志

[SYS] 2024/06/22 - 17:21:29 | model ratio not found: glm-4-9b
[INFO] 2024/06/22 - 17:21:29 | 2024062217212958147780023082485 | user 1 has enough quota 999222410797, trusted and no need to pre-consume
[ERR] 2024/06/22 - 17:21:29 | 2024062217212958147780023082485 | relay error happen, status code is 400, won't retry in this case
[ERR] 2024/06/22 - 17:21:29 | 2024062217212958147780023082485 | relay error (channel #13): bad response status code 400
[GIN] 2024/06/22 - 17:21:29 | 2024062217212958147780023082485 | 400 |     10.0611ms |     10.4.134.11 |    POST /v1/chat/completions

fastgpt报错日志

{
  message: '400 bad response status code 400 (request id: 2024062217212958147780023082485)',
  stack: 'Error: 400 bad response status code 400 (request id: 2024062217212958147780023082485)\n' +
    '    at eL.generate (/app/projects/app/.next/server/chunks/76750.js:15:67594)\n' +
    '    at av.makeStatusError (/app/projects/app/.next/server/chunks/76750.js:15:79337)\n' +
    '    at av.makeRequest (/app/projects/app/.next/server/chunks/76750.js:15:80260)\n' +
    '    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n' +
    '    at async w (/app/projects/app/.next/server/chunks/75612.js:309:2105)\n' +
    '    at async Object.w [as tools] (/app/projects/app/.next/server/chunks/75612.js:305:4790)\n' +
    '    at async k (/app/projects/app/.next/server/chunks/75612.js:313:2241)\n' +
    '    at async Promise.all (index 0)\n' +
    '    at async E (/app/projects/app/.next/server/chunks/75612.js:313:2782)\n' +
    '    at async h (/app/projects/app/.next/server/pages/api/core/chat/chatTest.js:1:3266)'
}

xinference报错

2024-06-22 17:25:16,414 xinference.core.supervisor 43237 DEBUG    Enter get_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f151234e2a0>, 'glm-4-9b'), kwargs: {}
2024-06-22 17:25:16,415 xinference.core.worker 43237 DEBUG    Enter get_model, args: (<xinference.core.worker.WorkerActor object at 0x7f15123c3e20>,), kwargs: {'model_uid': 'glm-4-9b-1-0'}
2024-06-22 17:25:16,415 xinference.core.worker 43237 DEBUG    Leave get_model, elapsed time: 0 s
2024-06-22 17:25:16,415 xinference.core.supervisor 43237 DEBUG    Leave get_model, elapsed time: 0 s
2024-06-22 17:25:16,416 xinference.core.supervisor 43237 DEBUG    Enter describe_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f151234e2a0>, 'glm-4-9b'), kwargs: {}
2024-06-22 17:25:16,416 xinference.core.worker 43237 DEBUG    Enter describe_model, args: (<xinference.core.worker.WorkerActor object at 0x7f15123c3e20>,), kwargs: {'model_uid': 'glm-4-9b-1-0'}
2024-06-22 17:25:16,416 xinference.core.worker 43237 DEBUG    Leave describe_model, elapsed time: 0 s
2024-06-22 17:25:16,416 xinference.core.supervisor 43237 DEBUG    Leave describe_model, elapsed time: 0 s

Jun 22 '24 09:06 JinCheng666

我发现用 claude 也有这个问题。

fastgpt | message: '400 messages: roles must alternate between "user" and "assistant", but found multiple "user" roles in a row (request id: 2024062912300280418215164311064)', fastgpt | stack: 'Error: 400 messages: roles must alternate between "user" and "assistant", but found multiple "user" roles in a row (request id: 2024062912300280418215164311064)\n' + fastgpt | ' at eL.generate (/app/projects/app/.next/server/chunks/76750.js:15:67594)\n' + fastgpt | ' at av.makeStatusError (/app/projects/app/.next/server/chunks/76750.js:15:79337)\n' + fastgpt | ' at av.makeRequest (/app/projects/app/.next/server/chunks/76750.js:15:80260)\n' + fastgpt | ' at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n' + fastgpt | ' at async Object.P [as chatNode] (/app/projects/app/.next/server/chunks/75612.js:312:686)\n' + fastgpt | ' at async k (/app/projects/app/.next/server/chunks/75612.js:313:2241)\n' + fastgpt | ' at async Promise.all (index 5)\n' + fastgpt | ' at async E (/app/projects/app/.next/server/chunks/75612.js:313:2782)\n' + fastgpt | ' at async C (/app/projects/app/.next/server/pages/api/v1/chat/completions.js:63:11920)\n' + fastgpt | ' at async /app/projects/app/.next/server/pages/api/core/app/list.js:1:5593' fastgpt | } fastgpt | [Info] 2024-06-29 10:30:03 Request finish /api/v1/chat/completions, time: 1195ms

Jun 29 '24 10:06 slot181

我发现用 claude 也有这个问题。

fastgpt | message: '400 messages: roles must alternate between "user" and "assistant", but found multiple "user" roles in a row (request id: 2024062912300280418215164311064)', fastgpt | stack: 'Error: 400 messages: roles must alternate between "user" and "assistant", but found multiple "user" roles in a row (request id: 2024062912300280418215164311064)\n' + fastgpt | ' at eL.generate (/app/projects/app/.next/server/chunks/76750.js:15:67594)\n' + fastgpt | ' at av.makeStatusError (/app/projects/app/.next/server/chunks/76750.js:15:79337)\n' + fastgpt | ' at av.makeRequest (/app/projects/app/.next/server/chunks/76750.js:15:80260)\n' + fastgpt | ' at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n' + fastgpt | ' at async Object.P [as chatNode] (/app/projects/app/.next/server/chunks/75612.js:312:686)\n' + fastgpt | ' at async k (/app/projects/app/.next/server/chunks/75612.js:313:2241)\n' + fastgpt | ' at async Promise.all (index 5)\n' + fastgpt | ' at async E (/app/projects/app/.next/server/chunks/75612.js:313:2782)\n' + fastgpt | ' at async C (/app/projects/app/.next/server/pages/api/v1/chat/completions.js:63:11920)\n' + fastgpt | ' at async /app/projects/app/.next/server/pages/api/core/app/list.js:1:5593' fastgpt | } fastgpt | [Info] 2024-06-29 10:30:03 Request finish /api/v1/chat/completions, time: 1195ms

claude 没支持函数调用

Jun 30 '24 01:06 c121914yu

更新xinference到0.12.3，仍然出现这个问题

Jul 01 '24 09:07 JinCheng666

更新xinference到0.12.3，仍然出现这个问题将这两个参数改成false config.json

  "toolChoice": false,
  "functionCall": false,

Jul 10 '24 19:07 romejiang

解决了吗？我在xinference上部署了qw2-7b模型，也是报这个错，但是dify上使用是正常的： message: '400 status code (no body)', stack: 'Error: 400 status code (no body)\n' + ' at APIError.generate (webpack-internal:///(api)/../../node_modules/.pnpm/[email protected][email protected]/node_modul ror.mjs:57:20)\n' + ' at OpenAI.makeStatusError (webpack-internal:///(api)/../../node_modules/.pnpm/[email protected][email protected]/node_ ai/core.mjs:292:65)\n' + ' at OpenAI.makeRequest (webpack-internal:///(api)/../../node_modules/.pnpm/[email protected][email protected]/node_modu ore.mjs:335:30)\n' + ' at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n' + ' at async runToolWithToolChoice (webpack-internal:///(api)/../../packages/service/core/workflow/dispatch/agent/run ice.ts:95:24)\n' +

Jul 11 '24 13:07 hellolixy

好多人遇到了这个问题，最近排查了一下。 400是参数格式不正确导致，工具调用在编排页面使用单步调试是没问题的，因为单步调试的stream参数为false，xinference或者openai api不会报错，如果直接对话或者整个应用调试，这时的stream参数为true，在传递了tools的情况下，接口会报400错误，这是因为模型选择tool的请求时，是不支持stream为true的。 @c121914yu 最新的fastgpt也有这个问题，麻烦确认下。在请求对话时，如果传递tools参数，则stream强制给false，后续将工具调用结果给大模型时，stream给true，是不是就可以了。

Aug 21 '24 15:08 mojin504

好多人遇到了这个问题，最近排查了一下。 400是参数格式不正确导致，工具调用在编排页面使用单步调试是没问题的，因为单步调试的stream参数为false，xinference或者openai api不会报错，如果直接对话或者整个应用调试，这时的stream参数为true，在传递了tools的情况下，接口会报400错误，这是因为模型选择tool的请求时，是不支持stream为true的。 @c121914yu 最新的fastgpt也有这个问题，麻烦确认下。在请求对话时，如果传递tools参数，则stream强制给false，后续将工具调用结果给大模型时，stream给true，是不是就可以了。

既然 GPT 可以设置为 true～为啥不是设置为 true 呢～这种应该改掉中间层，让他兼容 true 模式，即使是一次性返回。 xf 之前我看已经是支持 true 了。

Aug 22 '24 01:08 c121914yu

xinference在0.13的某个版本上已经支持了，小版本号我记不清了。请更新到xinference最新版试试。此问题将关闭

Aug 22 '24 01:08 JinCheng666

本地部署的glm4模型，官网提供的代码里没有/v1/embeddings这个API怎么解决呢？

Jan 15 '25 12:01 gq2010

本地部署的glm4模型，官网提供的代码里没有/v1/embeddings这个API怎么解决呢？

你还需要部署 Embedding 模型

Sep 15 '25 03:09 duanluan