eino
eino copied to clipboard
feat: Add video message part type support for Alibaba Cloud vision models
Summary
This PR adds support for a new video message part type in the eino schema to enable direct video frame transmission compatible with Alibaba Cloud's vision model services.
Background
Alibaba Cloud's Model Studio vision models support video content processing through a specific format that requires video frames to be sent as an array of base64-encoded images. This enhancement enables eino framework to work seamlessly with Alibaba Cloud's multimodal capabilities.
Reference: https://help.aliyun.com/zh/model-studio/vision#80dbf6ca8fh6s
Changes
- Added
ChatMessagePartTypeVideo = "video"constant for video message parts - Added
ChatMessageVideostruct to handle video frames with base64-encoded data - Updated
ChatMessagePartstruct to includeVideo *ChatMessageVideofield - Updated message concatenation logic to handle video fields
API Changes
New Types
// ChatMessagePartTypeVideo represents video content with multiple frames
const ChatMessagePartTypeVideo ChatMessagePartType = "video"
// ChatMessageVideo handles video frames in base64 format for Alibaba Cloud compatibility
type ChatMessageVideo struct {
Video []string `json:"video,omitempty"` // Array of base64-encoded frames
MIMEType string `json:"mime_type,omitempty"` // Frame format (e.g., "image/jpeg")
Extra map[string]any `json:"extra,omitempty"` // Additional metadata
}
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.
The following three links are a set of modifications:
https://github.com/cloudwego/eino/pull/254 https://github.com/meguminnnnnnnnn/go-openai/pull/3 https://github.com/cloudwego/eino-ext/pull/280