agents
agents copied to clipboard
Feature Request: Utterance timestamps in the ChatContext or Transcript
I want to capture the word level or atleast utterance level timestamps for both the user and agent transcripts and store the timestamp details in the ChatContext. The usecase is to use the ChatContext as the transcript to run some post-processing. Example use cases- displaying the calls in UI, redaction, summarization etc.
Currently the transcription data the SDK exposes for the user transcript is just the text part of the transcript and the utterance or word level timestamps are NOT exposed at all from the SDK.
The agent's transcript however does not even have the timestamps even though elevenlabs and cartesia TTS support timestamps in their API.
Has anyone tried a way to get these timestamp data from the livekit-agents SDK?
I am happy to submit a PR to add this feature. Would the PR be merged if I added this feature?