agents
agents copied to clipboard
AWS Nova Sonic Scripted Speech Output for Talking first for Inbound Calls
Feature Type
I cannot use LiveKit without it
Feature Description
Hi, So I wanted to experiment with different real-time models and I came across LiveKit which seems great for easily swapping out which real-time voice model I want before run-time.
However I noticed that for Nova Sonic, https://docs.livekit.io/reference/python/v1/livekit/plugins/aws/experimental/realtime/index.html#livekit.plugins.aws.experimental.realtime.RealtimeSession.generate_reply
In the docs, it doesn't support letting the agent talk first. This means livekit nova sonic plugin cannot support inbound calls.
def generate_reply(
self,
*,
instructions: NotGivenOr[str] = NOT_GIVEN,
) -> asyncio.Future[llm.GenerationCreatedEvent]:
logger.warning("unprompted generation is not supported by Nova Sonic's Realtime API")
fut = asyncio.Future[llm.GenerationCreatedEvent]()
fut.set_exception(
llm.RealtimeError("unprompted generation is not supported by Nova Sonic's Realtime API")
)
return fut
Previously when I had a direct Nova Sonic integration from their AWS docs, i found this article: https://repost.aws/questions/QU51ORzHxCSGiM446csqcY9Q/instead-of-user-trigger-first-i-need-the-nova-sonic-model-trigger-first-and-say-welcome-address-to-user-after-the-start-conversation-is-clicked
and I followed this solution: The workaround is to send a silent (pre-recorded) audio clip to Sonic to initiate the conversation.
I recorded a clip me saying "hi" (500 ms clip) and sent it to Nova Sonic so it would seem like sonic is talking first on the line.
Can you please implement this for generate_reply on nova sonic so it can handle inbound calls by talking first?
Workarounds / Alternatives
No response
Additional Context
No response
@ZafeerKhan I have the same problem currently, can you sent a code example of how you solved this with the workaround of sending hi message ? hahaha
Also tagging @longcw for visibility and idea of how to solve this as best as possible !
just save yourself the time, and use openai real-time haha. trust me. the code i have for my direct aws integration is ugly as hell. i couldnt get my voice agent working with all the tools I wanted. I switched to openai realtime using livekit architecture and its much easier to build
@ZafeerKhan I fixed it yesterday hahahahaha Thanks a lot, I did basically the same :D