azure-search-openai-demo icon indicating copy to clipboard operation
azure-search-openai-demo copied to clipboard

Embedding models v3 for integrated vectorization

Open egor-yudkin opened this issue 1 year ago • 2 comments
trafficstars


This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [x] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Deploy application with azd up

Any log messages given by the failure

n/a

Expected/desired behavior

OS and Version?

Windows 10

azd version?

1.9.5

Versions

n/a

Mention any other details that might be useful

Integrated vectorization supports embedding v3 models now. I tried to set up the application with these environment variables (I have the model deployment called "embedding-3s" already):

USE_FEATURE_INT_VECTORIZATION="true"
AZURE_OPENAI_EMB_DEPLOYMENT="embedding-3s"
AZURE_OPENAI_EMB_DEPLOYMENT_VERSION=1
AZURE_OPENAI_EMB_DIMENSIONS=1536
AZURE_OPENAI_EMB_MODEL_NAME="text-embedding-3-small"

It seems to be working fine - the prepdocs.py finished with no issues and the indexer worked fine on my small test set of the documents.

The skillset has a couple of null values for dimensions and modelName fields, is it something that matters? I don't know enough about this...

{
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "name": "#2",
      "description": "Skill to generate embeddings via Azure OpenAI",
      "context": "/document/pages/*",
      "resourceUri": "https://azoai-shared-eus2-dev.openai.azure.com",
      "apiKey": null,
      "deploymentId": "embedding-3s",
      "dimensions": null,
      "modelName": null,
      "inputs": [
        {
          "name": "text",
          "source": "/document/pages/*"
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "vector"
        }
      ],
      "authIdentity": null
    }

Maybe you can confirm if Embedding v3 models work fine and can be deployed with prepdocs.py correctly, and then update the documentation?


egor-yudkin avatar Aug 27 '24 21:08 egor-yudkin

I'm checking in with the Azure AI Search team about this, it's possible that an Azure AI Search SDK update would be needed.

pamelafox avatar Sep 03 '24 23:09 pamelafox

Response from AI Search team:

This is supported using the latest SDK versions (preview and GA). Here's how to use them with Python: azure-search-vector-samples/demo-python/code/e2e-demos/azure-ai-search-e2e-build-demo.ipynb at main · Azure/azure-search-vector-samples (github.com) . The newest AOAI embedding models have a model property that wasn't needed/present when using ada-002 so this is present only in the newest SDK versions.

So we'd need to update this repo to bring in the latest SDK version and verify everything still works as expected. If you have time to make that change, please consider making a PR. I don't know when I'll get to it.

pamelafox avatar Sep 04 '24 00:09 pamelafox

I've updated the SDK and I now pass in embedding dimensions, so this should work: https://github.com/Azure-Samples/azure-search-openai-demo/releases/tag/2024-10-17 Please file a new issue if that's not working.

pamelafox avatar Oct 18 '24 00:10 pamelafox