agents icon indicating copy to clipboard operation
agents copied to clipboard

PREFLIGHT_TRANSCRIPT STT support to SpeechMatics plugin

Open deanihansen opened this issue 1 month ago • 1 comments

Feature Type

Would make my life easier

Feature Description

The current LiveKit Speechmatics plugin does not emit PREFLIGHT_TRANSCRIPT events, which prevents preemptive generation feature from working & Adaptive is relying on the timer to expire before finishing..

Here's how my mental model of how preemptive generation works

The feature works by:

  1. STT emits PREFLIGHT_TRANSCRIPT with stable partial text
  2. LiveKit's AgentSession starts LLM inference speculatively
  3. When FINAL_TRANSCRIPT arrives, if it matches the preflight, the LLM response is already partially generated
  4. If the final differs, the speculative generation is discarded

Current State

The livekit-plugins-speechmatics plugin only emits:

  • INTERIM_TRANSCRIPT - For UI visualization
  • FINAL_TRANSCRIPT - When utterance is complete
  • END_OF_SPEECH - After final transcript
  • RECOGNITION_USAGE - For billing/metrics

It does not emit PREFLIGHT_TRANSCRIPT, so preemptive generation never triggers.

How Other Providers Implement This

Deepgram (livekit-plugins-deepgram/stt_v2.py):

# Emits PREFLIGHT_TRANSCRIPT on TurnStarted event
elif event_type == "TurnStarted":
    if not self._speaking:
        return
    self._send_transcript_event(stt.SpeechEventType.PREFLIGHT_TRANSCRIPT, data)

AssemblyAI (livekit-plugins-assemblyai/stt.py):

# Emits PREFLIGHT_TRANSCRIPT based on confidence threshold
final_event = stt.SpeechEvent(
    type=stt.SpeechEventType.PREFLIGHT_TRANSCRIPT,
    alternatives=[
        stt.SpeechData(
            language="en",
            text=preflight_text,
            confidence=avg_confidence,
        )
    ],
)

The Problem Today

Issue 1: No PREFLIGHT_TRANSCRIPT Emission

In stt.py lines 571-574:

if not finalized:
    event_type = stt.SpeechEventType.INTERIM_TRANSCRIPT  # Always interim
else:
    event_type = stt.SpeechEventType.FINAL_TRANSCRIPT    # Always final

There's no code path to emit PREFLIGHT_TRANSCRIPT.

Issue 2: ADAPTIVE Mode Not Fully Implemented

The END_OF_UTTERANCE handler is only registered for FIXED mode (lines 472-476):

if opts.end_of_utterance_mode == EndOfUtteranceMode.FIXED:
    @self._client.on(ServerMessageType.END_OF_UTTERANCE)
    def _evt_on_end_of_utterance(message: dict[str, Any]) -> None:
        self._handle_end_of_utterance()

ADAPTIVE mode doesn't register this handler, meaning EOU detection falls back to the timer only.

Issue 3: conversation_config Only Set for FIXED Mode

Lines 387-393:

if (
    self._stt_options.end_of_utterance_silence_trigger
    and self._stt_options.end_of_utterance_mode == EndOfUtteranceMode.FIXED
):
    transcription_config.conversation_config = ConversationConfig(...)

ADAPTIVE mode should also configure conversation_config with appropriate settings.

Add PREFLIGHT support please

Use Speechmatics TURN_PREDICTION (Native Approach) because Speechmatics already has the signals needed - they're just not being used by the plugin yet.

Available Signals (Currently Unused)

Signal Description
TURN_PREDICTION Predicts turn ending with TTL and reasoning
SMART_TURN_RESULT Acoustic turn-completion prediction
SmartTurnConfig Enables smart turn with configurable threshold
VAD_STATUS Speech probability (float)

Recommended Implementation

# In STTOptions, add:
@dataclasses.dataclass
class STTOptions:
    # ... existing fields ...
    enable_smart_turn: bool = True
    smart_turn_threshold: float = 0.5

# In _run(), register TURN_PREDICTION handler:
if opts.enable_smart_turn:
    @self._client.on(ServerMessageType.TURN_PREDICTION)  # or appropriate message type
    def _evt_on_turn_prediction(message: dict[str, Any]) -> None:
        # When we predict turn is ending, emit current partial as PREFLIGHT
        self._handle_turn_prediction(message)

def _handle_turn_prediction(self, message: dict[str, Any]) -> None:
    """Emit PREFLIGHT_TRANSCRIPT when turn prediction received."""
    speech_frames = self._get_frames_from_fragments()
    if not speech_frames or not any(frame.is_active for frame in speech_frames):
        return

    # Emit current transcript as PREFLIGHT
    for item in speech_frames:
        preflight_event = stt.SpeechEvent(
            type=stt.SpeechEventType.PREFLIGHT_TRANSCRIPT,
            alternatives=[
                item._as_speech_data(
                    self._stt._stt_options.speaker_active_format,
                    self._stt._stt_options.speaker_passive_format,
                ),
            ],
        )
        self._event_ch.send_nowait(preflight_event)

Configuration for SmartTurn

When creating the transcription config, enable SmartTurn:

def _process_config(self) -> None:
    # ... existing code ...

    if self._stt_options.enable_smart_turn:
        transcription_config.smart_turn_config = SmartTurnConfig(
            enabled=True,
            smart_turn_threshold=self._stt_options.smart_turn_threshold,
        )

Thanks a bunch for considering this. @theomonnom - pinging you as I saw you're the dev behind the current plugin 😄

Workarounds / Alternatives

I've reverting to deepgram for now without preflight, things are too slow for real time conversations

Additional Context

No response

deanihansen avatar Dec 18 '25 15:12 deanihansen