PREFLIGHT_TRANSCRIPT STT support to SpeechMatics plugin
Feature Type
Would make my life easier
Feature Description
The current LiveKit Speechmatics plugin does not emit PREFLIGHT_TRANSCRIPT events, which prevents preemptive generation feature from working & Adaptive is relying on the timer to expire before finishing..
Here's how my mental model of how preemptive generation works
The feature works by:
- STT emits
PREFLIGHT_TRANSCRIPTwith stable partial text - LiveKit's
AgentSessionstarts LLM inference speculatively - When
FINAL_TRANSCRIPTarrives, if it matches the preflight, the LLM response is already partially generated - If the final differs, the speculative generation is discarded
Current State
The livekit-plugins-speechmatics plugin only emits:
-
INTERIM_TRANSCRIPT- For UI visualization -
FINAL_TRANSCRIPT- When utterance is complete -
END_OF_SPEECH- After final transcript -
RECOGNITION_USAGE- For billing/metrics
It does not emit PREFLIGHT_TRANSCRIPT, so preemptive generation never triggers.
How Other Providers Implement This
Deepgram (livekit-plugins-deepgram/stt_v2.py):
# Emits PREFLIGHT_TRANSCRIPT on TurnStarted event
elif event_type == "TurnStarted":
if not self._speaking:
return
self._send_transcript_event(stt.SpeechEventType.PREFLIGHT_TRANSCRIPT, data)
AssemblyAI (livekit-plugins-assemblyai/stt.py):
# Emits PREFLIGHT_TRANSCRIPT based on confidence threshold
final_event = stt.SpeechEvent(
type=stt.SpeechEventType.PREFLIGHT_TRANSCRIPT,
alternatives=[
stt.SpeechData(
language="en",
text=preflight_text,
confidence=avg_confidence,
)
],
)
The Problem Today
Issue 1: No PREFLIGHT_TRANSCRIPT Emission
In stt.py lines 571-574:
if not finalized:
event_type = stt.SpeechEventType.INTERIM_TRANSCRIPT # Always interim
else:
event_type = stt.SpeechEventType.FINAL_TRANSCRIPT # Always final
There's no code path to emit PREFLIGHT_TRANSCRIPT.
Issue 2: ADAPTIVE Mode Not Fully Implemented
The END_OF_UTTERANCE handler is only registered for FIXED mode (lines 472-476):
if opts.end_of_utterance_mode == EndOfUtteranceMode.FIXED:
@self._client.on(ServerMessageType.END_OF_UTTERANCE)
def _evt_on_end_of_utterance(message: dict[str, Any]) -> None:
self._handle_end_of_utterance()
ADAPTIVE mode doesn't register this handler, meaning EOU detection falls back to the timer only.
Issue 3: conversation_config Only Set for FIXED Mode
Lines 387-393:
if (
self._stt_options.end_of_utterance_silence_trigger
and self._stt_options.end_of_utterance_mode == EndOfUtteranceMode.FIXED
):
transcription_config.conversation_config = ConversationConfig(...)
ADAPTIVE mode should also configure conversation_config with appropriate settings.
Add PREFLIGHT support please
Use Speechmatics TURN_PREDICTION (Native Approach) because Speechmatics already has the signals needed - they're just not being used by the plugin yet.
Available Signals (Currently Unused)
| Signal | Description |
|---|---|
TURN_PREDICTION |
Predicts turn ending with TTL and reasoning |
SMART_TURN_RESULT |
Acoustic turn-completion prediction |
SmartTurnConfig |
Enables smart turn with configurable threshold |
VAD_STATUS |
Speech probability (float) |
Recommended Implementation
# In STTOptions, add:
@dataclasses.dataclass
class STTOptions:
# ... existing fields ...
enable_smart_turn: bool = True
smart_turn_threshold: float = 0.5
# In _run(), register TURN_PREDICTION handler:
if opts.enable_smart_turn:
@self._client.on(ServerMessageType.TURN_PREDICTION) # or appropriate message type
def _evt_on_turn_prediction(message: dict[str, Any]) -> None:
# When we predict turn is ending, emit current partial as PREFLIGHT
self._handle_turn_prediction(message)
def _handle_turn_prediction(self, message: dict[str, Any]) -> None:
"""Emit PREFLIGHT_TRANSCRIPT when turn prediction received."""
speech_frames = self._get_frames_from_fragments()
if not speech_frames or not any(frame.is_active for frame in speech_frames):
return
# Emit current transcript as PREFLIGHT
for item in speech_frames:
preflight_event = stt.SpeechEvent(
type=stt.SpeechEventType.PREFLIGHT_TRANSCRIPT,
alternatives=[
item._as_speech_data(
self._stt._stt_options.speaker_active_format,
self._stt._stt_options.speaker_passive_format,
),
],
)
self._event_ch.send_nowait(preflight_event)
Configuration for SmartTurn
When creating the transcription config, enable SmartTurn:
def _process_config(self) -> None:
# ... existing code ...
if self._stt_options.enable_smart_turn:
transcription_config.smart_turn_config = SmartTurnConfig(
enabled=True,
smart_turn_threshold=self._stt_options.smart_turn_threshold,
)
Thanks a bunch for considering this. @theomonnom - pinging you as I saw you're the dev behind the current plugin 😄
Workarounds / Alternatives
I've reverting to deepgram for now without preflight, things are too slow for real time conversations
Additional Context
No response