azure-search-openai-demo
azure-search-openai-demo copied to clipboard
Embedding models v3 for integrated vectorization
This issue is for a: (mark with an x)
- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [x] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
Deploy application with azd up
Any log messages given by the failure
n/a
Expected/desired behavior
OS and Version?
Windows 10
azd version?
1.9.5
Versions
n/a
Mention any other details that might be useful
Integrated vectorization supports embedding v3 models now. I tried to set up the application with these environment variables (I have the model deployment called "embedding-3s" already):
USE_FEATURE_INT_VECTORIZATION="true"
AZURE_OPENAI_EMB_DEPLOYMENT="embedding-3s"
AZURE_OPENAI_EMB_DEPLOYMENT_VERSION=1
AZURE_OPENAI_EMB_DIMENSIONS=1536
AZURE_OPENAI_EMB_MODEL_NAME="text-embedding-3-small"
It seems to be working fine - the prepdocs.py finished with no issues and the indexer worked fine on my small test set of the documents.
The skillset has a couple of null values for dimensions and modelName fields, is it something that matters? I don't know enough about this...
{
"@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
"name": "#2",
"description": "Skill to generate embeddings via Azure OpenAI",
"context": "/document/pages/*",
"resourceUri": "https://azoai-shared-eus2-dev.openai.azure.com",
"apiKey": null,
"deploymentId": "embedding-3s",
"dimensions": null,
"modelName": null,
"inputs": [
{
"name": "text",
"source": "/document/pages/*"
}
],
"outputs": [
{
"name": "embedding",
"targetName": "vector"
}
],
"authIdentity": null
}
Maybe you can confirm if Embedding v3 models work fine and can be deployed with prepdocs.py correctly, and then update the documentation?
I'm checking in with the Azure AI Search team about this, it's possible that an Azure AI Search SDK update would be needed.
Response from AI Search team:
This is supported using the latest SDK versions (preview and GA). Here's how to use them with Python: azure-search-vector-samples/demo-python/code/e2e-demos/azure-ai-search-e2e-build-demo.ipynb at main · Azure/azure-search-vector-samples (github.com) . The newest AOAI embedding models have a model property that wasn't needed/present when using ada-002 so this is present only in the newest SDK versions.
So we'd need to update this repo to bring in the latest SDK version and verify everything still works as expected. If you have time to make that change, please consider making a PR. I don't know when I'll get to it.
I've updated the SDK and I now pass in embedding dimensions, so this should work: https://github.com/Azure-Samples/azure-search-openai-demo/releases/tag/2024-10-17 Please file a new issue if that's not working.