Realtime Session Update Configuration
Confirm this is an issue with the Python library and not an underlying OpenAI API
- [x] This is an issue with the Python library
Describe the bug
The intended behavior is to disable server-side VAD for the OpenAI Realtime model. We are using LiveKit to facilitate the websocket connection, but the bug is in the OpenAI library.
In particular, the openai.resources.beta.realtime.AsyncRealtimeConnection.send method uses event.to_json(use_api_names=True, exclude_defaults=True, exclude_unset=True) to serialize the SessionUpdateEvent. The issue is with the exclude_defaults=True parameter which doesn't include any values that are equal to their default values.
We have confirmed the two serial SessionUpdateEvents get composed, so a change from the first event is reflected in the resulting configuration of the second event. This makes the exclude_defaults=True argument particularly problematic because there is now no way to ever change a default and then change it back.
There are a couple of problems here. For VAD in particular, despite the default value of turn_detection=None in Session(BaseModel), it is in fact not None and instead some default server-side VAD values. When you try to pass None in the SessionUpdateEvent you can't change the turn_detection value because (1) exclude_defaults=True prevents you and (2) the default value is inconsistent with what actually exists by default.
There are two solutions:
- Remove
exclude_defaults=True - Update the default
turn_detectioninSession
To Reproduce
Please follow the steps below.
Code snippets
When running python minimal_worker.py console using LiveKit agents on branch dev-1.0 with the following model configuration:
agent = VoiceAgent(
instructions="You are a helpful assistant that can answer questions and help with tasks.",
llm=openai.realtime.RealtimeModel(
model="gpt-4o-realtime-preview-2024-12-17",
voice="alloy"
)
)
Then, within the _main_task of RealtimeSession, we hardcode the turn_detection=None parameter as follows:
self._msg_ch.send_nowait(
SessionUpdateEvent(
type="session.update",
session=session_update_event.Session(
model=self._realtime_model._opts.model, # type: ignore
voice=self._realtime_model._opts.voice, # type: ignore
input_audio_transcription=input_audio_transcription,
turn_detection=None
),
event_id=utils.shortuuid("session_update_"),
)
)
The issue here is that turn_detection never gets updated properly according to the SessionUpdatedEvent. This is related to the problem that this PR was attempting to solve.
For example, we get:
- The
SessionCreatedEventwith the defaultturn_detector. By the way, even after passing the query param to the websocket uriturn_detector=for a null value, it still returns with server-side VAD.
SessionCreatedEvent(..., turn_detection=TurnDetection(create_response=True, interrupt_response=True, prefix_padding_ms=300, silence_duration_ms=200, threshold=0.5, type='server_vad'), ...) type='session.created')
- After passing the
turn_detector=Noneargument to theSessionUpdateEventas mentioned above, we still eventually observe theSessionUpdatedEvent.
SessionUpdatedEvent(..., turn_detection=TurnDetection(create_response=True, interrupt_response=True, prefix_padding_ms=300, silence_duration_ms=200, threshold=0.5, type='server_vad'), ...), type='session.updated')
OS
macOS
Python version
Python v3.13
Library version
openai v1.66.3