openai-python icon indicating copy to clipboard operation
openai-python copied to clipboard

Azure OpenAI client transcribes or translates speech depending on the deployment used, not the library function

Open s-zanella opened this issue 1 year ago • 3 comments

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • [X] This is an issue with the Python library

Describe the bug

The behaviour of the AzureOpenAI client depends on the deployment used and not on the library function called.

For an OpenAI client client, client.audio.transcriptions transcribes speech and client.audio.translations translates it, as expected.

For an AzureOpenAI client azure_client, however, you get a transcription or a translation depending on the endpoint specified regardless of the library function used. The library function used only determines the class of the result object, not the text.

To Reproduce

  1. Observe OpenAI client behavior.
import os
from openai import OpenAI, AzureOpenAI
from azure.identity import get_bearer_token_provider, DefaultAzureCredential

# Example file: https://upload.wikimedia.org/wikipedia/commons/b/b1/Candide_01_voltaire.mp3
audio_file = "./Candide_01_voltaire.mp3"

openai_client = OpenAI()

with open(audio_file, 'rb') as f:
    result = openai_client.audio.transcriptions.create(
        file=f,            
        model="whisper-1",
    )

    print(f"openai_client.audio.transcriptions: {result.__class__.__name__, result.text[:60]}")

    result = openai_client.audio.translations.create(
        file=f,            
        model="whisper-1",
    )

    print(f"openai_client.audio.translations: {result.__class__.__name__, result.text[:60]}")
openai_client.audio.transcriptions: ('Transcription', "Chapitre premier de Candide ou l'optimisme, de Voltaire, enregistré pour LibriVo")
openai_client.audio.translations: ('Translation', 'Chapter 1 of Candide or Optimism, by Voltaire, recorded for LibriVox.org by Bern')
  1. Observe that AzureOpenAI client behavior does not depend on the library function used.
token_provider = get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")
azure_endpoint = os.environ['AZURE_ENDPOINT']
azure_deployment = "whisper/audio/transcriptions?api-version=2024-06-01"

azure_client = AzureOpenAI(
    azure_ad_token_provider=token_provider,
    azure_endpoint=os.environ['AZURE_ENDPOINT'],
    azure_deployment=azure_deployment,
    api_version="2024-06-01"
)

with open(audio_file, 'rb') as f:
    result = azure_client.audio.transcriptions.create(
        file=f,            
        model="whisper-1",
    )

    print(f"azure_client.audio.transcriptions: {result.__class__.__name__, result.text[:60]}")

    result = azure_client.audio.translations.create(
        file=f,            
        model="whisper-1",
    )

    print(f"azure_client.audio.translations: {result.__class__.__name__, result.text[:60]}")
azure_client.audio.transcriptions: ('Transcription', "Chapitre premier de Candide ou l'optimisme, de Voltaire, enr")
azure_client.audio.translations: ('Translation', "Chapitre premier de Candide ou l'optimisme, de Voltaire, enr")
azure_deployment = "whisper/audio/translations?api-version=2024-06-01"

azure_client = AzureOpenAI(
    azure_ad_token_provider=token_provider,
    azure_endpoint=os.environ['AZURE_ENDPOINT'],
    azure_deployment=azure_deployment,
    api_version="2024-06-01"
)

with open(audio_file, 'rb') as f:
    result = azure_client.audio.transcriptions.create(
        file=f,            
        model="whisper-1",
    )

    print(f"azure_client.audio.transcriptions: {result.__class__.__name__, result.text[:60]}")

    result = azure_client.audio.translations.create(
        file=f,            
        model="whisper-1",
    )

    print(f"azure_client.audio.translations: {result.__class__.__name__, result.text[:60]}")
azure_client.audio.transcriptions: ('Transcription', 'Chapter 1 of Candide or Optimism, by Voltaire, recorded for ')
azure_client.audio.translations: ('Translation', 'Chapter 1 of Candide or Optimism, by Voltaire, recorded for ')

Code snippets

No response

OS

Ubuntu

Python version

Python v3.12.7

Library version

openai v1.55.3

s-zanella avatar Nov 29 '24 13:11 s-zanella

cc @kristapratico

RobertCraigie avatar Nov 29 '24 15:11 RobertCraigie

@s-zanella can you share the reason for including the full path + API version ("whisper/audio/translations?api-version=2024-06-01") in the azure_deployment parameter? It is expected that only the deployment name, i.e. whisper, is passed and the client will build the URL.

kristapratico avatar Dec 02 '24 18:12 kristapratico

I see. https://{endpoint}/openai/deployments/whisper/audio/translations?api-version=2024-06-01 is the endpoint URI that shows in the Azure OpenAI Service portal. Usually for other services, copy & pasting this and passing it as the azure_endpoint works. This is what I did initially; the code above is refactored to use azure_endpoint and azure_deployment parameters and in retrospective makes it more evident that I should have used azure_deployment = "whisper" .

I tried removing the API version and using whisper/audio from the original URI but neither worked. Using just whisper works as expected. Still, the behaviour is puzzling and I feel that there should be a guard against using an endpoint that does not match the library function.

s-zanella avatar Dec 03 '24 10:12 s-zanella