feat: Image input (gpt-4 vision preview) support
Inquiry About Image Input Support (e.g., Using GPT-4 Vision Preview API)
Description
Hello,
I am currently using your framework for project development and would like to know if it supports image input functionalities. Specifically, I am interested in whether it's possible to integrate and utilize OpenAI's GPT-4 Vision Preview API for processing and analyzing image data.
Questions
-
Image Input Support
Does the framework have built-in modules or functionalities that support image inputs? -
Usage Examples or Documentation
If supported, could you provide relevant usage examples or links to documentation to guide me on how to integrate and use this feature? -
Future Feature Plans
If image input is not currently supported, are there any plans to include this functionality in future releases? -
Integration with Third-Party Libraries
If the framework does not support image inputs at the moment, do you have any recommended methods or third-party libraries that can be easily integrated to add image input capabilities?
Additional Information
To better meet my project requirements, I aim to leverage both image processing capabilities and existing text processing features. If there are any example codes or best practices available, I would greatly appreciate it if you could share them.
Thank you for your assistance!
-
We do not currently have any multimodal inputs.
-
There definitely are plans but it would require a greater in-depth look at our current API to figure out where we can fit it. There are other features that are a bit higher on our TODO list.
-
You could look into directly working with something like burn (which we plan on deeper integrations in the future). This is a much lower level library but should have examples that could get you going!
Multi-modal rig agents are definitely an important feature on our minds but it'll take a coordinated effort to build a suitable and elegant API for it!
Closed by #199