Azure OpenAI client transcribes or translates speech depending on the deployment used, not the library function
Confirm this is an issue with the Python library and not an underlying OpenAI API
- [X] This is an issue with the Python library
Describe the bug
The behaviour of the AzureOpenAI client depends on the deployment used and not on the library function called.
For an OpenAI client client, client.audio.transcriptions transcribes speech and client.audio.translations translates it, as expected.
For an AzureOpenAI client azure_client, however, you get a transcription or a translation depending on the endpoint specified regardless of the library function used. The library function used only determines the class of the result object, not the text.
To Reproduce
- Observe
OpenAIclient behavior.
import os
from openai import OpenAI, AzureOpenAI
from azure.identity import get_bearer_token_provider, DefaultAzureCredential
# Example file: https://upload.wikimedia.org/wikipedia/commons/b/b1/Candide_01_voltaire.mp3
audio_file = "./Candide_01_voltaire.mp3"
openai_client = OpenAI()
with open(audio_file, 'rb') as f:
result = openai_client.audio.transcriptions.create(
file=f,
model="whisper-1",
)
print(f"openai_client.audio.transcriptions: {result.__class__.__name__, result.text[:60]}")
result = openai_client.audio.translations.create(
file=f,
model="whisper-1",
)
print(f"openai_client.audio.translations: {result.__class__.__name__, result.text[:60]}")
openai_client.audio.transcriptions: ('Transcription', "Chapitre premier de Candide ou l'optimisme, de Voltaire, enregistré pour LibriVo")
openai_client.audio.translations: ('Translation', 'Chapter 1 of Candide or Optimism, by Voltaire, recorded for LibriVox.org by Bern')
- Observe that
AzureOpenAIclient behavior does not depend on the library function used.
token_provider = get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")
azure_endpoint = os.environ['AZURE_ENDPOINT']
azure_deployment = "whisper/audio/transcriptions?api-version=2024-06-01"
azure_client = AzureOpenAI(
azure_ad_token_provider=token_provider,
azure_endpoint=os.environ['AZURE_ENDPOINT'],
azure_deployment=azure_deployment,
api_version="2024-06-01"
)
with open(audio_file, 'rb') as f:
result = azure_client.audio.transcriptions.create(
file=f,
model="whisper-1",
)
print(f"azure_client.audio.transcriptions: {result.__class__.__name__, result.text[:60]}")
result = azure_client.audio.translations.create(
file=f,
model="whisper-1",
)
print(f"azure_client.audio.translations: {result.__class__.__name__, result.text[:60]}")
azure_client.audio.transcriptions: ('Transcription', "Chapitre premier de Candide ou l'optimisme, de Voltaire, enr")
azure_client.audio.translations: ('Translation', "Chapitre premier de Candide ou l'optimisme, de Voltaire, enr")
azure_deployment = "whisper/audio/translations?api-version=2024-06-01"
azure_client = AzureOpenAI(
azure_ad_token_provider=token_provider,
azure_endpoint=os.environ['AZURE_ENDPOINT'],
azure_deployment=azure_deployment,
api_version="2024-06-01"
)
with open(audio_file, 'rb') as f:
result = azure_client.audio.transcriptions.create(
file=f,
model="whisper-1",
)
print(f"azure_client.audio.transcriptions: {result.__class__.__name__, result.text[:60]}")
result = azure_client.audio.translations.create(
file=f,
model="whisper-1",
)
print(f"azure_client.audio.translations: {result.__class__.__name__, result.text[:60]}")
azure_client.audio.transcriptions: ('Transcription', 'Chapter 1 of Candide or Optimism, by Voltaire, recorded for ')
azure_client.audio.translations: ('Translation', 'Chapter 1 of Candide or Optimism, by Voltaire, recorded for ')
Code snippets
No response
OS
Ubuntu
Python version
Python v3.12.7
Library version
openai v1.55.3
cc @kristapratico
@s-zanella can you share the reason for including the full path + API version ("whisper/audio/translations?api-version=2024-06-01") in the azure_deployment parameter? It is expected that only the deployment name, i.e. whisper, is passed and the client will build the URL.
I see. https://{endpoint}/openai/deployments/whisper/audio/translations?api-version=2024-06-01 is the endpoint URI that shows in the Azure OpenAI Service portal. Usually for other services, copy & pasting this and passing it as the azure_endpoint works. This is what I did initially; the code above is refactored to use azure_endpoint and azure_deployment parameters and in retrospective makes it more evident that I should have used azure_deployment = "whisper" .
I tried removing the API version and using whisper/audio from the original URI but neither worked. Using just whisper works as expected. Still, the behaviour is puzzling and I feel that there should be a guard against using an endpoint that does not match the library function.