agents icon indicating copy to clipboard operation
agents copied to clipboard

AWS Nova Sonic Scripted Speech Output for Talking first for Inbound Calls

Open ZafeerKhan opened this issue 4 weeks ago • 4 comments

Feature Type

I cannot use LiveKit without it

Feature Description

Hi, So I wanted to experiment with different real-time models and I came across LiveKit which seems great for easily swapping out which real-time voice model I want before run-time.

However I noticed that for Nova Sonic, https://docs.livekit.io/reference/python/v1/livekit/plugins/aws/experimental/realtime/index.html#livekit.plugins.aws.experimental.realtime.RealtimeSession.generate_reply

In the docs, it doesn't support letting the agent talk first. This means livekit nova sonic plugin cannot support inbound calls.

def generate_reply(
    self,
    *,
    instructions: NotGivenOr[str] = NOT_GIVEN,
) -> asyncio.Future[llm.GenerationCreatedEvent]:
    logger.warning("unprompted generation is not supported by Nova Sonic's Realtime API")
    fut = asyncio.Future[llm.GenerationCreatedEvent]()
    fut.set_exception(
        llm.RealtimeError("unprompted generation is not supported by Nova Sonic's Realtime API")
    )
    return fut

Previously when I had a direct Nova Sonic integration from their AWS docs, i found this article: https://repost.aws/questions/QU51ORzHxCSGiM446csqcY9Q/instead-of-user-trigger-first-i-need-the-nova-sonic-model-trigger-first-and-say-welcome-address-to-user-after-the-start-conversation-is-clicked

and I followed this solution: The workaround is to send a silent (pre-recorded) audio clip to Sonic to initiate the conversation. I recorded a clip me saying "hi" (500 ms clip) and sent it to Nova Sonic so it would seem like sonic is talking first on the line.

Can you please implement this for generate_reply on nova sonic so it can handle inbound calls by talking first?

Workarounds / Alternatives

No response

Additional Context

No response

ZafeerKhan avatar Nov 11 '25 14:11 ZafeerKhan

@ZafeerKhan I have the same problem currently, can you sent a code example of how you solved this with the workaround of sending hi message ? hahaha

Shekswess avatar Nov 25 '25 08:11 Shekswess

Also tagging @longcw for visibility and idea of how to solve this as best as possible !

Shekswess avatar Nov 25 '25 08:11 Shekswess

just save yourself the time, and use openai real-time haha. trust me. the code i have for my direct aws integration is ugly as hell. i couldnt get my voice agent working with all the tools I wanted. I switched to openai realtime using livekit architecture and its much easier to build

ZafeerKhan avatar Nov 26 '25 04:11 ZafeerKhan

@ZafeerKhan I fixed it yesterday hahahahaha Thanks a lot, I did basically the same :D

Shekswess avatar Nov 26 '25 11:11 Shekswess