llm Multi-modal support for vision models such as GPT-4 vision

Multi-modal support for vision models such as GPT-4 vision

Open cmungall opened this issue 7 months ago • 41 comments

https://platform.openai.com/docs/guides/vision

I think this is best handled by command line options --image and --image-urls to either encode and pass as base64, or to pass a URL.

Nov 07 '23 00:11 cmungall