azure-sdk-for-java icon indicating copy to clipboard operation
azure-sdk-for-java copied to clipboard

[BUG] Different embedding 3 vectors in Azure vs. OpenAI

Open thai-op opened this issue 1 year ago • 3 comments

Describe the bug When you request the same input, same api version, same model for text embedding large 3 in Azure vs. in OpenAI, you get slightly different results on the vector. They are small in floating values (< 0.0001), but in aggregate they are different enough to get bad ranking results when we mix the two together.

I don't see any documentation describing this behavior so I'm just asking here in case anyone from the Azure team knows an answer to this.

curl -H 'Authorization:Bearer <key>' -H 'Content-Type: application/json' https://api.openai.com/v1/embeddings\?api-version\=2024-03-01-preview -d "{\"input\":\"Pancreatitis in Dogs\",\"model\":\"text-embedding-3-large\"}" -o /tmp/embedding-2024-03-01-preview.openai.json

curl -H 'api-key: <key>' -H 'Content-Type: application/json' https://<region_deployment>.azure.com/openai/deployments/<deployment>/embeddings\?api-version\=2024-03-01-preview -d "{\"input\":\"Pancreatitis in Dogs\",\"model\":\"text-embedding-3-large\"}" -o /tmp/embedding-2024-03-01-preview.azure.json

diff /tmp/embedding-2024-03-01-preview.azure.json /tmp/embedding-2024-03-01-preview.openai.json -y | less

{                                                                       {
  "object": "list",                                                       "object": "list",
  "data": [                                                               "data": [
    {                                                                       {
      "object": "embedding",                                                  "object": "embedding",
      "index": 0,                                                             "index": 0,
      "embedding": [                                                          "embedding": [
        -0.01958797,                                            |               -0.019557063,
        0.00020787802,                                          |               0.00021087051,
        -0.013025163,                                           |               -0.013026878,
        0.05607405,                                             |               0.055992138,
        0.02763522,                                             |               0.027616536,
        0.012355488,                                            |               0.012345954,
        0.014163609,                                            |               0.014165475,
        -0.019387066,                                           |               -0.019389622,
        0.03151933,                                             |               0.031568136,
        0.0012730785,                                           |               0.0012551069,
        0.009816307,                                            |               0.009778531,
        0.016228437,                                            |               0.016241739,
        0.036809757,                                            |               0.03685926,
        -0.045493197,                                           |               -0.04547687,
        -0.023371628,                                           |               -0.023374708,

Exception or Stack Trace See above

To Reproduce See above

Code Snippet Add the code snippet that causes the issue.

Expected behavior The vectors should be exactly the same

Setup (please complete the following information): n/a

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • [X] Bug Description Added
  • [X] Repro Steps Added
  • [X] Setup information Added

thai-op avatar May 17 '24 19:05 thai-op

Hi @thai-op, given your sample in this issue is using cURL, and not the azure-ai-openai SDK, this seems to be a service issue and not an SDK issue, is that correct?

alzimmermsft avatar May 17 '24 19:05 alzimmermsft

Right, it’s not an sdk issue. The API service is the issue but I don’t know where to raise it so here I am.

On Fri, May 17, 2024 at 9:30 AM Alan Zimmer @.***> wrote:

Hi @thai-op https://github.com/thai-op, given your sample in this issue is using cURL, and not the azure-ai-openai SDK, this seems to be a service issue and not an SDK issue, is that correct?

— Reply to this email directly, view it on GitHub https://github.com/Azure/azure-sdk-for-java/issues/40243#issuecomment-2118249331, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7WITL4ENIXJTSPBBKPCQSDZCZLGDAVCNFSM6AAAAABH4UZE2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJYGI2DSMZTGE . You are receiving this because you were mentioned.Message ID: @.***>

thai-op avatar May 17 '24 19:05 thai-op

@brandom-msft @jpalvarezl do you know where this feedback should be re-routed to?

alzimmermsft avatar May 17 '24 20:05 alzimmermsft

Closed this issue as it is not related to SDK issue

mssfang avatar Aug 29 '24 17:08 mssfang