chatbox
chatbox copied to clipboard
[BUG] Incorrect `content` parameter type for `system` role when using multimodal Doubao models
[BUG] Incorrect content parameter type for system role when using multimodal Doubao models
Environment Information
- Chatbox Version: v1.9.8
- Operating System: Windows 10
- Model Name: doubao-1-5-vision-pro-32k-250115
Issue Description
When sending requests containing images using a multimodal model, the system returns an InvalidParameter error. The model expects the content field for the system role to be a string type, but the current implementation incorrectly serializes system.content as an array of dictionaries when multimodal input is detected.
Reproduction Steps
- Configure the multimodal model API
- Create a request with an image:
{
"messages": [
{
"role": "system",
"content": [{"type": "text", "text": "你是一个助手"}]
},
{
"role": "user",
"content": [
{"type": "text", "text": "识别图片"},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,"}}
]
}
],
"model": "doubao-1-5-vision-pro-32k-250115"
}
- Send the request
Expected Result
Successful processing of multimodal input
Actual Result
Error response:
{
"error": {
"code": "InvalidParameter",
"message": "The parameter `messages.content` specified in the request are not valid: expected a string, but got `[map[text:You are a helpful assistant. You can help me by answering my questions. You can also ask me questions. type:text]]` instead. Request id: 021739969154753225aa6952d6004241aafdfe13638693e88c6b0",
"param": "messages.content",
"type": "BadRequest"
}
}
Additional Information
- The error indicates the
systemrole'scontentfield requires a string type (e.g.,"You are a helpful assistant") - Current implementation erroneously converts all
contentfields to array structures when multimodal input is detected - Request ID: 021739969154753225aa6952d6004241aafdfe13638693e88c6b0
Proposed Fix
Add special handling for the system role's content field to ensure compliance with the target model's requirements:
if message.role == "system":
content = content[0]["text"] # Extract plain text