azure-sdk-for-python AI Model Inference Readme and Samples Issue

Section link1, link2, link3, link4, link5, link6, link7, link8, link9, link10, link11, link12, link13, link14, link15, link16, link17, link18:

Reason: Unauthorized. Access token is missing, invalid, audience is incorrect, or have expired.

Section link1

Reason: Indent error.

Suggestion: Remove whitespace.

@rohit-ganguly , @lmazuel , @achandmsft , @mayurid , @dargilco for notification.

Aug 27 '24 08:08 zimuli157

Thank you @zimuli157 for opening this issue! I'm the owner of this SDK.

Issue #2: I removed the white space in my current PR. Thanks!

Issue #1: Please make sure your key is valid. Is this a Serverless API endpoint (aka MaaS)? Is this a chat completion model? Can you share your endpoint URL? If so, please try this simple cURL command to make sure your key is valid, before running the samples (replace the two environment variables with the endpoint and key you see in Azure AI Studio). Note that this is example is for API key authentication, not Entra ID authentication. Please follow up with me over IM to continue discussing this. Thanks!

curl  -v "%AZURE_AI_CHAT_ENDPOINT%/chat/completions" -H "Content-Type: application/json" -H "Authorization: Bearer %AZURE_AI_CHAT_KEY%" -d "{\"messages\":[{\"role\":\"user\",\"content\":\"how many feet in a mile?\"}]}"

Aug 27 '24 14:08 dargilco

@dargilco For Issue https://github.com/Azure/azure-sdk-for-python/pull/1: I ran the command as you suggested, and the result was the same as when I ran the sample before. The results are shown below.

We think there is a mistake in this command. When we replaced Authorization: Bearer %AZURE_AI_CHAT_KEY% with api-key: %AZURE_AI_CHAT_KEY%, the authentication was successful. The results are as follows.

The complete command at this point is

curl  -v "https://***.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2023-03-15-preview" -H "Content-Type: application/json" -H "api-key: {api-key}" -d '{"messages":[{"role":"user","content":"how many feet in a mile?"}]}'

Aug 30 '24 06:08 jerryshia

@jerryshia Yes, if your model is an OpenAI model hosted on Azure OpenAI, you need the "api-key: " header. Please see new package docs here, showing how to create a ChatCompletionsClient for Azure OpenAI endpoint: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference#create-and-authenticate-a-client-directly-using-api-key-or-github-token

There are also a few samples (with file names containing azure_openai) in the samples folder: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/samples

Sep 03 '24 15:09 dargilco

@dargilco When I use the same endpoint as the endpoint URL to run the sample file, the above issue occurs. I believe the aforementioned issue also exists in the SDK, which can lead to similar errors. Please check the way requests are sent in the SDK's code.

Sep 04 '24 02:09 jerryshia

@dargilco When I use the same endpoint as the endpoint URL to run the sample file, the above issue occurs. I believe the aforementioned issue also exists in the SDK, which can lead to similar errors. Please check the way requests are sent in the SDK's code.

@dargilco Any ideas?

Sep 26 '24 07:09 v-xuto

@v-xuto please provide full details about your issue, including source code and SDK logs from your failed run.

To enable SDK logging, add this at the top of your code:

import sys import logging logger = logging.getLogger("azure") logger.setLevel(logging.DEBUG) logger.addHandler(logging.StreamHandler(stream=sys.stdout))

And add this additional input parameter to the constructor for ChatCompletionsClient:

logging_enable=True

Please make sure to remove any secrets when sharing here (api-key).

Sep 26 '24 15:09 dargilco

@dargilco Here is the log when the run failed:

XXXX-09-27 11:19:38,316 - azure.core.pipeline.policies._universal - DEBUG - Request URL: 'https://{}.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version={}'
Request method: 'POST'
Request headers:
    'Content-Type': 'application/json'
    'Content-Length': '138'
    'Accept': 'application/json'
    'x-ms-client-request-id': '{request-id}'
    'User-Agent': 'azsdk-python-ai-inference/1.0.0b4 Python/3.11.9 (Windows-10-10.0.22631-SP0)'
    'Authorization': 'Bearer {key}'
Request body:
{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "How many feet are in a mile?"}]}
XXXX-09-27 11:19:38,316 - azure.core.pipeline.policies.http_logging_policy - INFO - Request URL: 'https://{}.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=REDACTED'
Request method: 'POST'
Request headers:
    'Content-Type': 'application/json'
    'Content-Length': '138'
    'Accept': 'application/json'
    'x-ms-client-request-id': '{request-id}'
    'User-Agent': 'azsdk-python-ai-inference/1.0.0b4 Python/3.11.9 (Windows-10-10.0.22631-SP0)'
    'Authorization': 'REDACTED'
A body is sent with the request
XXXX-09-27 11:19:38,319 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): {}.openai.azure.com:443
XXXX-09-27 11:19:39,584 - urllib3.connectionpool - DEBUG - https://{}.openai.azure.com:443 "POST /openai/deployments/gpt-35-turbo/chat/completions?api-version={} HTTP/11" 4XX 161
XXXX-09-27 11:19:39,588 - azure.core.pipeline.policies.http_logging_policy - INFO - Response status: 4XX
Response headers:
    'Content-Length': '161'
    'Content-Type': 'application/json'
    'x-ms-client-request-id': '{request-id}'
    'apim-request-id': 'REDACTED'
    'Strict-Transport-Security': 'REDACTED'
    'x-content-type-options': 'REDACTED'
    'Date': 'Fri, 27 Sep XXXX 03:19:38 GMT'
XXXX-09-27 11:19:39,588 - azure.core.pipeline.policies._universal - DEBUG - Response status: '4XX'
Response headers:
    'Content-Length': '161'
    'Content-Type': 'application/json'
    'x-ms-client-request-id': '53f70d41-7c7f-11ef-b41a-cc96e5348815'
    'apim-request-id': '{request-id}'
    'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload'
    'x-content-type-options': 'nosniff'
    'Date': 'Fri, 27 Sep XXXX 03:19:38 GMT'
Response content:
{ "statusCode": 4XX, "message": "Unauthorized. Access token is missing, invalid, audience is incorrect (https://cognitiveservices.azure.com), or have expired." }

and the file run was sample_chat_completions.py.

We found that when we added the parameter headers={'api-key': {api-key}} to ChatCompletionsClient, the run was successful. We believe that the parameter credential may not be functional here, as it does not pass the api-key. Therefore, the run was successful when the api-key was added as a parameter. Please check the code.Here is the log when the run passed.

2024-09-27 15:40:50,282 - azure.core.pipeline.policies._universal - DEBUG - Request URL: 'https://{}.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-05-01-preview'
Request method: 'POST'
Request headers:
    'Content-Type': 'application/json'
    'Content-Length': '138'
    'api-key': '{}'
    'Accept': 'application/json'
    'x-ms-client-request-id': '{request-id}'
    'User-Agent': 'azsdk-python-ai-inference/1.0.0b4 Python/3.10.11 (Windows-10-10.0.22631-SP0)'
    'Authorization': 'Bearer 123'
Request body:
{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "How many feet are in a mile?"}]}
2024-09-27 15:40:50,282 - azure.core.pipeline.policies.http_logging_policy - INFO - Request URL: 'https://{}.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=REDACTED'
Request method: 'POST'
Request headers:
    'Content-Type': 'application/json'
    'Content-Length': '138'
    'api-key': 'REDACTED'
    'Accept': 'application/json'
    'x-ms-client-request-id': '{request-id}'
    'User-Agent': 'azsdk-python-ai-inference/1.0.0b4 Python/3.10.11 (Windows-10-10.0.22631-SP0)'
    'Authorization': 'REDACTED'
A body is sent with the request
2024-09-27 15:40:50,282 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): {}.openai.azure.com:443
2024-09-27 15:40:51,785 - urllib3.connectionpool - DEBUG - https://{}.openai.azure.com:443 "POST /openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-05-01-preview HTTP/1.1" 200 997
2024-09-27 15:40:51,785 - azure.core.pipeline.policies.http_logging_policy - INFO - Response status: 200
Response headers:
    'Cache-Control': 'no-cache, must-revalidate'
    'Content-Length': '997'
    'Content-Type': 'application/json'
    'access-control-allow-origin': 'REDACTED'
    'apim-request-id': 'REDACTED'
    'Strict-Transport-Security': 'REDACTED'
    'x-content-type-options': 'REDACTED'
    'x-ms-region': 'REDACTED'
    'x-ratelimit-remaining-requests': 'REDACTED'
    'x-ratelimit-remaining-tokens': 'REDACTED'
    'x-accel-buffering': 'REDACTED'
    'x-ms-rai-invoked': 'REDACTED'
    'x-request-id': 'REDACTED'
    'x-ms-client-request-id': '{request-id}'
    'azureml-model-session': 'REDACTED'
    'Date': 'Fri, 27 Sep 2024 07:40:51 GMT'
2024-09-27 15:40:51,796 - azure.core.pipeline.policies._universal - DEBUG - Response status: '200'
Response headers:
    'Cache-Control': 'no-cache, must-revalidate'
    'Content-Length': '997'
    'Content-Type': 'application/json'
    'access-control-allow-origin': '*'
    'apim-request-id': '{request-id}'
    'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload'
    'x-content-type-options': 'nosniff'
    'x-ms-region': 'East US'
    'x-ratelimit-remaining-requests': '119'
    'x-ratelimit-remaining-tokens': '119984'
    'x-accel-buffering': 'no'
    'x-ms-rai-invoked': 'true'
    'x-request-id': 'c9f033a4-2537-4367-aed5-e1545d83dee8'
    'x-ms-client-request-id': '{request-id}'
    'azureml-model-session': 'turbo-0301-42727a61'
    'Date': 'Fri, 27 Sep 2024 07:40:51 GMT'
Response content:
{"choices":[{"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"protected_material_code":{"filtered":false,"detected":false},"protected_material_text":{"filtered":false,"detected":false},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"There are 5280 feet in a mile.","role":"assistant"}}],"created":1727422851,"id":"chatcmpl-ABzspR3IFAcXtfdu0SKJUuw5dTSqI","model":"gpt-35-turbo","object":"chat.completion","prompt_filter_results":[{"prompt_index":0,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"jailbreak":{"filtered":false,"detected":false},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}}}],"system_fingerprint":null,"usage":{"completion_tokens":10,"prompt_tokens":27,"total_tokens":37}}

Sep 27 '24 08:09 jerryshia

Closing this issue, since we added support for api-key header since SDK version 1.0.0b5 (2024-10-16).

Jan 09 '25 16:01 dargilco