[BUG] Incorrect `content` parameter type for `system` role when using multimodal Doubao models

Open nanhuayu opened this issue 10 months ago • 0 comments

[BUG] Incorrect content parameter type for system role when using multimodal Doubao models

Environment Information

Chatbox Version: v1.9.8
Operating System: Windows 10
Model Name: doubao-1-5-vision-pro-32k-250115

Issue Description
When sending requests containing images using a multimodal model, the system returns an InvalidParameter error. The model expects the content field for the system role to be a string type, but the current implementation incorrectly serializes system.content as an array of dictionaries when multimodal input is detected.

Reproduction Steps

Configure the multimodal model API
Create a request with an image:

{
  "messages": [
    {
      "role": "system",
      "content": [{"type": "text", "text": "你是一个助手"}]
    },
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "识别图片"},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,"}}
      ]
    }
  ],
  "model": "doubao-1-5-vision-pro-32k-250115"
}

Send the request

Expected Result
Successful processing of multimodal input

Actual Result
Error response:

{
  "error": {
    "code": "InvalidParameter",
    "message":  "The parameter `messages.content` specified in the request are not valid: expected a string, but got `[map[text:You are a helpful assistant. You can help me by answering my questions. You can also ask me questions. type:text]]` instead. Request id: 021739969154753225aa6952d6004241aafdfe13638693e88c6b0",
    "param": "messages.content",
    "type": "BadRequest"
  }
}

Additional Information

The error indicates the system role's content field requires a string type (e.g., "You are a helpful assistant")
Current implementation erroneously converts all content fields to array structures when multimodal input is detected
Request ID: 021739969154753225aa6952d6004241aafdfe13638693e88c6b0

Proposed Fix
Add special handling for the system role's content field to ensure compliance with the target model's requirements:

if message.role == "system":
    content = content[0]["text"]  # Extract plain text

Feb 20 '25 01:02 nanhuayu