openai-python icon indicating copy to clipboard operation
openai-python copied to clipboard

Realtime Session Update Configuration

Open anishnag opened this issue 9 months ago • 0 comments

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • [x] This is an issue with the Python library

Describe the bug

The intended behavior is to disable server-side VAD for the OpenAI Realtime model. We are using LiveKit to facilitate the websocket connection, but the bug is in the OpenAI library.

In particular, the openai.resources.beta.realtime.AsyncRealtimeConnection.send method uses event.to_json(use_api_names=True, exclude_defaults=True, exclude_unset=True) to serialize the SessionUpdateEvent. The issue is with the exclude_defaults=True parameter which doesn't include any values that are equal to their default values.

We have confirmed the two serial SessionUpdateEvents get composed, so a change from the first event is reflected in the resulting configuration of the second event. This makes the exclude_defaults=True argument particularly problematic because there is now no way to ever change a default and then change it back.

There are a couple of problems here. For VAD in particular, despite the default value of turn_detection=None in Session(BaseModel), it is in fact not None and instead some default server-side VAD values. When you try to pass None in the SessionUpdateEvent you can't change the turn_detection value because (1) exclude_defaults=True prevents you and (2) the default value is inconsistent with what actually exists by default.

There are two solutions:

  • Remove exclude_defaults=True
  • Update the default turn_detection in Session

To Reproduce

Please follow the steps below.

Code snippets

When running python minimal_worker.py console using LiveKit agents on branch dev-1.0 with the following model configuration:

agent = VoiceAgent(
    instructions="You are a helpful assistant that can answer questions and help with tasks.",
    llm=openai.realtime.RealtimeModel(
        model="gpt-4o-realtime-preview-2024-12-17",
        voice="alloy"
    )
)

Then, within the _main_task of RealtimeSession, we hardcode the turn_detection=None parameter as follows:

self._msg_ch.send_nowait(
    SessionUpdateEvent(
        type="session.update",
        session=session_update_event.Session(
            model=self._realtime_model._opts.model,  # type: ignore
            voice=self._realtime_model._opts.voice,  # type: ignore
            input_audio_transcription=input_audio_transcription,
            turn_detection=None
        ),
        event_id=utils.shortuuid("session_update_"),
    )
)

The issue here is that turn_detection never gets updated properly according to the SessionUpdatedEvent. This is related to the problem that this PR was attempting to solve.

For example, we get:

  • The SessionCreatedEvent with the default turn_detector. By the way, even after passing the query param to the websocket uri turn_detector= for a null value, it still returns with server-side VAD.
SessionCreatedEvent(..., turn_detection=TurnDetection(create_response=True, interrupt_response=True, prefix_padding_ms=300, silence_duration_ms=200, threshold=0.5, type='server_vad'), ...) type='session.created')
  • After passing the turn_detector=None argument to the SessionUpdateEvent as mentioned above, we still eventually observe the SessionUpdatedEvent.
SessionUpdatedEvent(..., turn_detection=TurnDetection(create_response=True, interrupt_response=True, prefix_padding_ms=300, silence_duration_ms=200, threshold=0.5, type='server_vad'), ...), type='session.updated')

OS

macOS

Python version

Python v3.13

Library version

openai v1.66.3

anishnag avatar Mar 13 '25 23:03 anishnag