azure-sdk-for-net icon indicating copy to clipboard operation
azure-sdk-for-net copied to clipboard

added support for DataUris when using Vision

Open dersia opened this issue 10 months ago • 9 comments

Azure OpenAI SDK should support DataUris when calling Vision Models.

There are multiple Issues open across multiple Repositories that all ask for the same thing to be fixed. I had opened a few PRs to solve this, but in the end it was all stuck on changing the Azure Rest API spec to adjust the changes needed to solve this issue. Or so I thought. I found a better solution that would differently then this PR https://github.com/Azure/azure-rest-api-specs/pull/27780 not be a breaking change. I do leverage the possibility for custom code ontop of the generated code that is used in this repo already.

This will add a Property string DataUri { get; } to ChatMessageImageUrl and override the Serialization methods for it. While serializing it will check, if DataUri is set to any content, and if so, it will prefer it over the System.Uri and when deserializing it will try to deserialize to System.Uri and if it fails with the Message "Invalid URI: The Uri string is too long." it will set the DataUri to the returned Value.

This will solve the following issues https://github.com/Azure/azure-sdk-for-net/issues/42591 https://github.com/Azure/azure-sdk-for-net/issues/40744 https://github.com/Azure/azure-sdk-for-net/issues/40855 https://github.com/Azure/azure-rest-api-specs/pull/27780

This will also avoid the limitations of System.Uri (at least for the purpose of use with vision) as described here: https://github.com/dotnet/runtime/issues/96544

After this is merged I can finally do the reminder part of this issue: https://github.com/microsoft/semantic-kernel/issues/4781 And solve https://github.com/microsoft/semantic-kernel/issues/4272

I hope this will be merged fast and we can finally get DataUri images from the sdk to Azure GPT.

dersia avatar Mar 29 '24 13:03 dersia

Thank you for your contribution @dersia! We will review the pull request and get back to you soon.

github-actions[bot] avatar Mar 29 '24 13:03 github-actions[bot]

I will rebase and resolve the conflict once there is a review in this. the conflict was to be expected as long as this isn't merged.

dersia avatar Apr 02 '24 18:04 dersia

@KrzysztofCwalina or @tg-msft may I ask for a review? 😊 this is very blocking

dersia avatar Apr 04 '24 06:04 dersia

@m-nash can we resolve the conversations and move this forward?

dersia avatar Apr 11 '24 18:04 dersia

I can rebase and resolve the issue over and over again, which comes from regenerated files, but as long as this is not merged this will be very repetitive 😓 so I would love the get feedback and a go, so I can do a final rebase and we can finally merge this. there are other issues waiting for this so we can finally use AzureOpenAI ans semantic kernel with GPT Vision.

so if any of the maintainers could please help here out, I'd appreciate it.

@m-nash @joseharriaga @jpalvarezl @trrwilson @tg-msft

dersia avatar Apr 16 '24 20:04 dersia

Hoping this gets merged, our team at Microsoft wants to use this too.

austinbaccus avatar Apr 16 '24 20:04 austinbaccus

I'm sure there are a lot of people that would like to see this merged soon, please.

We are blocked on using Vision in our service because we use private networking, and SAS URLS to images in a storage container does not work either.

This PR would resolve that issue in a really nice way.

Our other option is to drop the SDK all together and drop to native rest, but that seems like trying to "crack a nut with a sledge hammer".

davidames avatar Apr 25 '24 18:04 davidames

Can someone review and merge this pull request please - @m-nash @KrzysztofCwalina . It would be highly appreciated, We are blocked on using vision in our service since long.

vandana2015 avatar Apr 26 '24 04:04 vandana2015

Also eager to support this effort! We are having to resort to a temporary/hacky workaround to use a custom connector implementation specifically for Vision calls in order to stay with our SK-focused strategy. It's the key sticking point for a multi-modal solution for us

AdaTheDev avatar Apr 26 '24 07:04 AdaTheDev

@dersia We really appreciate you opening the PR and pushing things forward. Closing this PR, as the linked PR from @trrwilson fixed the problem. Use version 1.0.0-beta.17 or later to get this support. See the changelog entry at https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/openai/Azure.AI.OpenAI/CHANGELOG.md#features-added-3.

scottaddie avatar Jun 26 '24 21:06 scottaddie