Integration of Nova sonic(AWS multimodal)
Is there anything planned for release of the AWS Nova sonic multimodal?
https://docs.aws.amazon.com/nova/latest/userguide/speech.html
Yes please! It would be awesome to be able to switch between Openai realtime API, Gemini live api and Amazon Nova Sonic realtime models. More context: https://aws.amazon.com/ai/generative-ai/nova/speech/
we are planning to add Nova as a s2s model that we'll support. the team has done some early work in the integration. It's very do-able
we are planning to add Nova as a s2s model that we'll support. the team has done some early work in the integration. It's very do-able
that would be great
@davidzhao Hi David, It looks like we are working on the same thing. I also have some working code that I can share. I would like to collaborate with you on this workstream. What would be the best way to contact you?
Can we get on a zoom call tormorrow? Any time between 11AM to 1PM PST works for me. Based on your availability, I will send you a Zoom invite.
@BumaldaOverTheWater94 yes! let's chat. My email is ___. I can chat on Friday, but let's coordinate over email :)
Hey! @davidzhao Just wanted to share an update—our team also working on a LiveKit plugin for Nova Sonic. We've already built a version using an earlier release of LiveKit (LiveKit version 0.20.4 and livekit-agents version 0.12.17) that works seamlessly with Nova Sonic, and now we're developing a new version based on the latest LiveKit update. So far, we’re able to use tools and have smooth conversations, and we're currently focusing on improving how interruptions are handled during conversations.
@riyageorge1, can you share what you've developed?
@davidzhao Could you please provide an update on the release of this plugin? It would be greatly appreciated if a beta version could be released soon, allowing us to start working on LiveKit and Nova Sonic.
Any update on the release of the plugin?
We have done some similar work using the AWS SDK for Swift and AvAudioEngine to implement smooth playback, interruption, echo cancellation and other features for Nova Sonic voice. Refer to the following open source code under SwiftChat App:
https://github.com/aws-samples/swift-chat/blob/main/react-native/ios/Services/AudioManager.swift
Results as follows:
https://github.com/user-attachments/assets/ebf21b12-9c93-4d2e-a109-1d6484019838
I also look forward to this feature being implemented in LiveKit, being able to implement it across multiple platforms will indeed simplify a lot of work.
Hi all, Thanks for your patience. Nova Sonic plugin is available now as part of LiveKit Agents SDK v1.1.5 If you encounter any bugs, raise an issue and tag me.
Hi @BumaldaOverTheWater94, thanks for this integration
Could you help me with a minimal working example?
I'm trying to use the livekit.agents module as follows:
from dotenv import load_dotenv
from livekit import agents
from livekit.agents import Agent, AgentSession, RoomInputOptions
from livekit.plugins import aws, noise_cancellation
load_dotenv(override=True)
class NovaSonicAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a helpful voice assistant.",
)
async def entrypoint(ctx: agents.JobContext):
session = AgentSession(llm=aws.realtime.realtime_model.RealtimeModel())
await session.start(
room=ctx.room,
agent=NovaSonicAgent(),
room_input_options=RoomInputOptions(
noise_cancellation=noise_cancellation.BVC(),
),
)
await ctx.connect()
if __name__ == "__main__":
agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))
However, when I try to run it, I get the following error:
AttributeError: 'IndexError' object has no attribute 'message'
Let me know if I’m missing something in the setup or if this might be a bug. I’d really appreciate any guidance
That are my lib versions:
"livekit>=1.0.11",
"livekit-agents[google,openai,tavus]>=1.1.5",
"livekit-api>=1.0.3",
"livekit-plugins-aws[realtime]>=1.1.5",
"livekit-plugins-noise-cancellation~=0.2",
Edit:
The error start on:
# note: user ASR text is slightly different than what is sent to LiveKit (newline vs whitespace) # noqa: E501
# TODO: fix this
self._update_chat_ctx(role="user", text_content=text_content)
Edit 2:
Working adding the following code to _update_chat_ctx function (aws.experimental.realtime.realtime_model.py RealSession._update_chat_ctx)
prev_utterance = self._chat_ctx.items[-1] if self._chat_ctx.items else None
if not prev_utterance or not prev_utterance.content:
# no previous utterance, so just add the new one
self._chat_ctx.add_message(role=role, content=text_content)
if len(self._chat_ctx.items) > MAX_MESSAGES:
self._chat_ctx.truncate(max_items=MAX_MESSAGES)
return
Hi @MatheusRDG Thanks for trying out the plugin. Actually this is a bug that I noticed right after the v1.1.15 release. If you look at the latest code (https://github.com/livekit/agents/blob/main/livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/experimental/realtime/realtime_model.py#L664) and run from there, that should handle the issue.
Use something like uv pip install -e . to link the source code as a dependency to your working project.
one other option is to add some messages to the chat_ctx so you don't encounter the index OOB error
class Assistant(Agent):
def __init__(self, tools: list[llm.FunctionTool | llm.RawFunctionTool]) -> None:
chat_ctx = ChatContext.empty()
chat_ctx.add_message(role="user", content="hey sonic, tell me a children's story")
chat_ctx.add_message(role="assistant", content=story)
super().__init__(
instructions="You are a helpful voice AI assistant.",
tools=tools,
chat_ctx=chat_ctx,
)
@BumaldaOverTheWater94 Both options are working for me. Really appreciate your work, thank you!
Does Nova-sonic support function tooling? Does it support RunContext?
Does Nova-sonic support function tooling? Does it support RunContext?
Yes, function tool calling is supported. See https://github.com/livekit/agents/pull/2817 for an example.
Unfortunately RunContext is not currently supported as the RealtimeSession does not have knowledge of the outside AgentSession to inject RunContext.
Adding support for RunContext is WIP.
@mridulrao RunContext support has been added with v1.1.16
marking this issue as completed