azure-search-openai-demo
azure-search-openai-demo copied to clipboard
Embeddings vector dimensions mismatch indexer error
This issue is for a: (mark with an x
)
- [X] bug report -> please search issues before submitting
Minimal steps to reproduce
Set .env variables as follows: AZURE_OPENAI_EMB_DEPLOYMENT="text-embedding-3-large" AZURE_OPENAI_EMB_DEPLOYMENT_CAPACITY=350 AZURE_OPENAI_EMB_DEPLOYMENT_VERSION=1 AZURE_OPENAI_EMB_DIMENSIONS=1536 USE_FEATURE_INT_VECTORIZATION="true"
Then do azd up
Any log messages given by the failure
When the indexer tries to run, it fails with this:
There's a mismatch in vector dimensions. The vector field 'embedding', with dimension of '1536', expects a length of '1536'. However, the provided vector has a length of '3072'. Please ensure that the vector length matches the expected length of the vector field. Read the following documentation for more details: https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-configure-compression-storage.
When inspecting the code for gptkbindex-skillset in the portal, I notice this bit of code:
{
"@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
"name": "#2",
"description": "Skill to generate embeddings via Azure OpenAI",
"context": "/document/pages/*",
"resourceUri": "https://cog-trnz2cbjn4ofs.openai.azure.com",
"apiKey": null,
"deploymentId": "text-embedding-3-large",
"dimensions": null,
"modelName": null
So dimensions and modelName are null. Additonnally, there is this warning in a banner above the code:
This skillset contains an AzureOpenAIEmbedding Skill created by previous API versions that doesn't include the 'modelName' field. We recommend you to migrate by adding 'experimental' value automatically to the field to restore full portal functionality.
If I manually change the skillset code in the portal with this, it works:
"dimensions": 1536,
"modelName": "text-embedding-3-large",
I tried to change the code in integratedvectorizerstrategy.py to this:
import os
embeddingDimensions = int(os.getenv('AZURE_OPENAI_EMB_DIMENSIONS'))
embeddingModelName = os.getenv('AZURE_OPENAI_EMB_MODEL_NAME')
embedding_skill = AzureOpenAIEmbeddingSkill(
description="Skill to generate embeddings via Azure OpenAI",
context="/document/pages/*",
resource_uri=f"https://{self.embeddings.open_ai_service}.openai.azure.com",
deployment_id=self.embeddings.open_ai_deployment,
dimensions=embeddingDimensions,
modelName=embeddingModelName,
inputs=[
InputFieldMappingEntry(name="text", source="/document/pages/*"),
],
outputs=[OutputFieldMappingEntry(name="embedding", target_name="vector")],
)
However, for some reason, this doesn't change the code for the skillset that I see in the portal, even if I delete the skillset completely to make sure that it gets regenerated.
Expected/desired behavior
No indexer error.
OS and Version?
Windows 11
azd version?
azd version 1.9.5 (commit cd2b7af9995d358aab33c782614f801ac1997dde)
Versions
I merged the last commit from 2024-07-16 (main #1789) into my local fork. So I do have some local code modifications but AFAIK, none that would affect this.