sample-app-aoai-chatGPT icon indicating copy to clipboard operation
sample-app-aoai-chatGPT copied to clipboard

Unexpected Keyword Argument 'dimensions' in embeddings.create() Call

Open Lin-ux-404 opened this issue 1 year ago • 0 comments

Bug Report: Unexpected Keyword Argument 'dimensions' in embeddings.create() Call

Description

I am encountering an error while running the script scripts/data_preparation.py. It appears that the dimensions argument is being passed to the embeddings.create() function in get_embedding(), but this function does not accept dimensions as an argument.

How to Reproduce

  1. Run the following command:

    py scripts\data_preparation.py --config scripts\config.json --njobs=4 --form-rec-resource [FORM_REC_RESOURCE] --form-rec-key [FORM_REC_KEY] --embedding-model-endpoint [EMBDEDDING_MODEL_ENDPOINT]
    
  2. Use the following configuration in scripts/config.json:

    [
        {
            "data_path": "[DATA_PATH]",
            "location": "eastus",
            "subscription_id": "[SUBSCRIPTION_ID]",
            "resource_group": "[RESOURCE_GROUP]",
            "search_service_name": "[SEARCH_SERVICE_NAME]",
            "index_name": "[INDEX_NAME]",
            "chunk_size": 1024,
            "token_overlap": 128,
            "semantic_config_name": "default",
            "language": "en",
            "vector_config_name": "default"
        }
    ]
    
  3. Ensure the following environment variables are set:

    FLAG_EMBEDDING_MODEL=AOAI
    FLAG_COHERE=ENGLISH
    FLAG_AAI=V3
    VECTOR_DIMENSION=1536
    AZURE_OPENAI_API_VERSION=2023-05-15
    AZURE_OPENAI_ENDPOINT=[OPEN_AI_ENDPOINT]
    AZURE_OPENAI_API_KEY=[OPENAI_KEY]
    COHERE_MULTILINGUAL_ENDPOINT=
    COHERE_MULTILINGUAL_API_KEY=
    COHERE_ENGLISH_ENDPOINT=
    COHERE_ENGLISH_API_KEY=
    

Error Message

The error message I encountered is:

Error getting embedding for chunk with error=Error getting embeddings with endpoint=[ENDPOINT] with error=Embeddings.create() got an unexpected keyword argument 'dimensions', retrying, current at 1 retry, 4 retries left

Suspected Cause

In the scripts/data_utils.py, within the get_embedding() function, the code is passing the dimensions argument to the embeddings.create() method, but the method signature does not expect a dimensions argument. Here's the relevant code snippet:

client = AzureOpenAI(api_version=api_version, azure_endpoint=base_url, api_key=api_key)

if FLAG_AOAI == "V2":
    embeddings = client.embeddings.create(model=deployment_id, input=text)
elif FLAG_AOAI == "V3":
    embeddings = client.embeddings.create(
        model=deployment_id, 
        input=text, 
        dimensions=int(os.getenv("VECTOR_DIMENSION", 1536))
    )

According to the documentation, the embeddings.create() function does not accept a dimensions argument. Here's the expected method signature:

def create(
    *,
    input: str | List[str] | List[int] | List[List[int]],
    model: str = 'text-embedding-ada-002',
    encoding_format: NotGiven | Literal['float', 'base64'] = NOT_GIVEN,
    user: str | NotGiven = NOT_GIVEN,
    extra_headers: Headers | None = None,
    extra_query: Query | None = None,
    extra_body: Body | None = None,
    timeout: float | Timeout | NotGiven | None = NOT_GIVEN
) -> CreateEmbeddingResponse

The method does not take a dimensions argument, which is likely causing the error.

Lin-ux-404 avatar Sep 30 '24 12:09 Lin-ux-404