azure-sdk-for-net
azure-sdk-for-net copied to clipboard
added support for DataUris when using Vision
Azure OpenAI SDK should support DataUris when calling Vision Models.
There are multiple Issues open across multiple Repositories that all ask for the same thing to be fixed. I had opened a few PRs to solve this, but in the end it was all stuck on changing the Azure Rest API spec to adjust the changes needed to solve this issue. Or so I thought. I found a better solution that would differently then this PR https://github.com/Azure/azure-rest-api-specs/pull/27780 not be a breaking change. I do leverage the possibility for custom code ontop of the generated code that is used in this repo already.
This will add a Property string DataUri { get; }
to ChatMessageImageUrl
and override the Serialization methods for it.
While serializing it will check, if DataUri is set to any content, and if so, it will prefer it over the System.Uri
and when deserializing it will try to deserialize to System.Uri
and if it fails with the Message "Invalid URI: The Uri string is too long."
it will set the DataUri to the returned Value.
This will solve the following issues https://github.com/Azure/azure-sdk-for-net/issues/42591 https://github.com/Azure/azure-sdk-for-net/issues/40744 https://github.com/Azure/azure-sdk-for-net/issues/40855 https://github.com/Azure/azure-rest-api-specs/pull/27780
This will also avoid the limitations of System.Uri
(at least for the purpose of use with vision) as described here:
https://github.com/dotnet/runtime/issues/96544
After this is merged I can finally do the reminder part of this issue: https://github.com/microsoft/semantic-kernel/issues/4781 And solve https://github.com/microsoft/semantic-kernel/issues/4272
I hope this will be merged fast and we can finally get DataUri images from the sdk to Azure GPT.
Thank you for your contribution @dersia! We will review the pull request and get back to you soon.
I will rebase and resolve the conflict once there is a review in this. the conflict was to be expected as long as this isn't merged.
@KrzysztofCwalina or @tg-msft may I ask for a review? 😊 this is very blocking
@m-nash can we resolve the conversations and move this forward?
I can rebase and resolve the issue over and over again, which comes from regenerated files, but as long as this is not merged this will be very repetitive 😓 so I would love the get feedback and a go, so I can do a final rebase and we can finally merge this. there are other issues waiting for this so we can finally use AzureOpenAI ans semantic kernel with GPT Vision.
so if any of the maintainers could please help here out, I'd appreciate it.
@m-nash @joseharriaga @jpalvarezl @trrwilson @tg-msft
Hoping this gets merged, our team at Microsoft wants to use this too.
I'm sure there are a lot of people that would like to see this merged soon, please.
We are blocked on using Vision in our service because we use private networking, and SAS URLS to images in a storage container does not work either.
This PR would resolve that issue in a really nice way.
Our other option is to drop the SDK all together and drop to native rest, but that seems like trying to "crack a nut with a sledge hammer".
Can someone review and merge this pull request please - @m-nash @KrzysztofCwalina . It would be highly appreciated, We are blocked on using vision in our service since long.
Also eager to support this effort! We are having to resort to a temporary/hacky workaround to use a custom connector implementation specifically for Vision calls in order to stay with our SK-focused strategy. It's the key sticking point for a multi-modal solution for us
@dersia We really appreciate you opening the PR and pushing things forward. Closing this PR, as the linked PR from @trrwilson fixed the problem. Use version 1.0.0-beta.17 or later to get this support. See the changelog entry at https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/openai/Azure.AI.OpenAI/CHANGELOG.md#features-added-3.