dspy
dspy copied to clipboard
Image support inside complex types
Currently, only you can only pass a single image at a time in a signature.
E.g. this will work
class ImageSignature(dspy.Signature):
image1: dspy.Image = dspy.InputField()
image2: dspy.Image = dspy.InputField()
But any more complex types involving images wont:
class ImageSignature(dspy.Signature):
images: List[dspy.Image] = dspy.InputField()
class ImageSignature(dspy.Signature):
labeled_images: Dict[str, dspy.Image] = dspy.InputField()
This is due to how images are compiled into OAI compatible messages, where inside chat_adapter.py
we create a large list of content blocks by giving fields with an image_url special privileges:
{
"content": [{
"type": "text",
"text": "...",
},
{
"type": "image_url"
"image_url": {"url": "..."} # url is either an actual url or the base64 data
}]
}
I do some fairly naive parsing inside ChatAdapter
, and there is definitely a more elegant solution here.
#1763 addresses the List case, but I want a more generalized solution.
cc @okhat