dspy icon indicating copy to clipboard operation
dspy copied to clipboard

Image support inside complex types

Open isaacbmiller opened this issue 3 months ago • 6 comments

Currently, only you can only pass a single image at a time in a signature.

E.g. this will work

class ImageSignature(dspy.Signature):
    image1: dspy.Image = dspy.InputField()
    image2: dspy.Image = dspy.InputField()

But any more complex types involving images wont:

class ImageSignature(dspy.Signature):
    images: List[dspy.Image] = dspy.InputField()

class ImageSignature(dspy.Signature):
    labeled_images: Dict[str, dspy.Image] = dspy.InputField()

This is due to how images are compiled into OAI compatible messages, where inside chat_adapter.py we create a large list of content blocks by giving fields with an image_url special privileges:

{
    "content": [{
         "type": "text",
         "text": "...",
    },
    {
         "type": "image_url"
         "image_url": {"url": "..."} # url is either an actual url or the base64 data
    }]
}

I do some fairly naive parsing inside ChatAdapter, and there is definitely a more elegant solution here. #1763 addresses the List case, but I want a more generalized solution.

cc @okhat

isaacbmiller avatar Nov 06 '24 17:11 isaacbmiller