azure-sdk-for-java
                                
                                 azure-sdk-for-java copied to clipboard
                                
                                    azure-sdk-for-java copied to clipboard
                            
                            
                            
                        [BUG] Different embedding 3 vectors in Azure vs. OpenAI
Describe the bug When you request the same input, same api version, same model for text embedding large 3 in Azure vs. in OpenAI, you get slightly different results on the vector. They are small in floating values (< 0.0001), but in aggregate they are different enough to get bad ranking results when we mix the two together.
I don't see any documentation describing this behavior so I'm just asking here in case anyone from the Azure team knows an answer to this.
curl -H 'Authorization:Bearer <key>' -H 'Content-Type: application/json' https://api.openai.com/v1/embeddings\?api-version\=2024-03-01-preview -d "{\"input\":\"Pancreatitis in Dogs\",\"model\":\"text-embedding-3-large\"}" -o /tmp/embedding-2024-03-01-preview.openai.json
curl -H 'api-key: <key>' -H 'Content-Type: application/json' https://<region_deployment>.azure.com/openai/deployments/<deployment>/embeddings\?api-version\=2024-03-01-preview -d "{\"input\":\"Pancreatitis in Dogs\",\"model\":\"text-embedding-3-large\"}" -o /tmp/embedding-2024-03-01-preview.azure.json
diff /tmp/embedding-2024-03-01-preview.azure.json /tmp/embedding-2024-03-01-preview.openai.json -y | less
{                                                                       {
  "object": "list",                                                       "object": "list",
  "data": [                                                               "data": [
    {                                                                       {
      "object": "embedding",                                                  "object": "embedding",
      "index": 0,                                                             "index": 0,
      "embedding": [                                                          "embedding": [
        -0.01958797,                                            |               -0.019557063,
        0.00020787802,                                          |               0.00021087051,
        -0.013025163,                                           |               -0.013026878,
        0.05607405,                                             |               0.055992138,
        0.02763522,                                             |               0.027616536,
        0.012355488,                                            |               0.012345954,
        0.014163609,                                            |               0.014165475,
        -0.019387066,                                           |               -0.019389622,
        0.03151933,                                             |               0.031568136,
        0.0012730785,                                           |               0.0012551069,
        0.009816307,                                            |               0.009778531,
        0.016228437,                                            |               0.016241739,
        0.036809757,                                            |               0.03685926,
        -0.045493197,                                           |               -0.04547687,
        -0.023371628,                                           |               -0.023374708,
Exception or Stack Trace See above
To Reproduce See above
Code Snippet Add the code snippet that causes the issue.
Expected behavior The vectors should be exactly the same
Setup (please complete the following information): n/a
Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
- [X] Bug Description Added
- [X] Repro Steps Added
- [X] Setup information Added
Hi @thai-op, given your sample in this issue is using cURL, and not the azure-ai-openai SDK, this seems to be a service issue and not an SDK issue, is that correct?
Right, it’s not an sdk issue. The API service is the issue but I don’t know where to raise it so here I am.
On Fri, May 17, 2024 at 9:30 AM Alan Zimmer @.***> wrote:
Hi @thai-op https://github.com/thai-op, given your sample in this issue is using cURL, and not the azure-ai-openai SDK, this seems to be a service issue and not an SDK issue, is that correct?
— Reply to this email directly, view it on GitHub https://github.com/Azure/azure-sdk-for-java/issues/40243#issuecomment-2118249331, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7WITL4ENIXJTSPBBKPCQSDZCZLGDAVCNFSM6AAAAABH4UZE2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJYGI2DSMZTGE . You are receiving this because you were mentioned.Message ID: @.***>
@brandom-msft @jpalvarezl do you know where this feedback should be re-routed to?
Closed this issue as it is not related to SDK issue