agents icon indicating copy to clipboard operation
agents copied to clipboard

Integration of Nova sonic(AWS multimodal)

Open kailashsp opened this issue 11 months ago • 1 comments

Is there anything planned for release of the AWS Nova sonic multimodal?

https://docs.aws.amazon.com/nova/latest/userguide/speech.html

kailashsp avatar Apr 22 '25 20:04 kailashsp

Yes please! It would be awesome to be able to switch between Openai realtime API, Gemini live api and Amazon Nova Sonic realtime models. More context: https://aws.amazon.com/ai/generative-ai/nova/speech/

timopetric avatar Apr 29 '25 13:04 timopetric

we are planning to add Nova as a s2s model that we'll support. the team has done some early work in the integration. It's very do-able

davidzhao avatar May 28 '25 00:05 davidzhao

we are planning to add Nova as a s2s model that we'll support. the team has done some early work in the integration. It's very do-able

that would be great

dariusteep avatar May 28 '25 09:05 dariusteep

@davidzhao Hi David, It looks like we are working on the same thing. I also have some working code that I can share. I would like to collaborate with you on this workstream. What would be the best way to contact you?

Can we get on a zoom call tormorrow? Any time between 11AM to 1PM PST works for me. Based on your availability, I will send you a Zoom invite.

BumaldaOverTheWater94 avatar May 28 '25 19:05 BumaldaOverTheWater94

@BumaldaOverTheWater94 yes! let's chat. My email is ___. I can chat on Friday, but let's coordinate over email :)

davidzhao avatar May 29 '25 07:05 davidzhao

Hey! @davidzhao Just wanted to share an update—our team also working on a LiveKit plugin for Nova Sonic. We've already built a version using an earlier release of LiveKit (LiveKit version 0.20.4 and livekit-agents version 0.12.17) that works seamlessly with Nova Sonic, and now we're developing a new version based on the latest LiveKit update. So far, we’re able to use tools and have smooth conversations, and we're currently focusing on improving how interruptions are handled during conversations.

riyageorge1 avatar Jun 04 '25 04:06 riyageorge1

@riyageorge1, can you share what you've developed?

Kruhlikau avatar Jun 04 '25 14:06 Kruhlikau

@davidzhao Could you please provide an update on the release of this plugin? It would be greatly appreciated if a beta version could be released soon, allowing us to start working on LiveKit and Nova Sonic.

12121vishnu avatar Jun 09 '25 08:06 12121vishnu

Any update on the release of the plugin?

timopetric avatar Jun 24 '25 13:06 timopetric

We have done some similar work using the AWS SDK for Swift and AvAudioEngine to implement smooth playback, interruption, echo cancellation and other features for Nova Sonic voice. Refer to the following open source code under SwiftChat App:

https://github.com/aws-samples/swift-chat/blob/main/react-native/ios/Services/AudioManager.swift

Results as follows:

https://github.com/user-attachments/assets/ebf21b12-9c93-4d2e-a109-1d6484019838

I also look forward to this feature being implemented in LiveKit, being able to implement it across multiple platforms will indeed simplify a lot of work.

zhu-xiaowei avatar Jun 26 '25 08:06 zhu-xiaowei

Hi all, Thanks for your patience. Nova Sonic plugin is available now as part of LiveKit Agents SDK v1.1.5 If you encounter any bugs, raise an issue and tag me.

BumaldaOverTheWater94 avatar Jun 30 '25 17:06 BumaldaOverTheWater94

Hi @BumaldaOverTheWater94, thanks for this integration

Could you help me with a minimal working example?

I'm trying to use the livekit.agents module as follows:

from dotenv import load_dotenv

from livekit import agents
from livekit.agents import Agent, AgentSession, RoomInputOptions
from livekit.plugins import aws, noise_cancellation

load_dotenv(override=True)

class NovaSonicAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful voice assistant.",
        )

async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(llm=aws.realtime.realtime_model.RealtimeModel())

    await session.start(
        room=ctx.room,
        agent=NovaSonicAgent(),
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    await ctx.connect()

if __name__ == "__main__":
    agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

However, when I try to run it, I get the following error:

AttributeError: 'IndexError' object has no attribute 'message'

Let me know if I’m missing something in the setup or if this might be a bug. I’d really appreciate any guidance

That are my lib versions:

    "livekit>=1.0.11",
    "livekit-agents[google,openai,tavus]>=1.1.5",
    "livekit-api>=1.0.3",
    "livekit-plugins-aws[realtime]>=1.1.5",
    "livekit-plugins-noise-cancellation~=0.2",

Edit:

The error start on:

                # note: user ASR text is slightly different than what is sent to LiveKit (newline vs whitespace)  # noqa: E501
                # TODO: fix this
                self._update_chat_ctx(role="user", text_content=text_content)

Edit 2:

Working adding the following code to _update_chat_ctx function (aws.experimental.realtime.realtime_model.py RealSession._update_chat_ctx)

        prev_utterance = self._chat_ctx.items[-1] if self._chat_ctx.items else None

        if not prev_utterance or not prev_utterance.content:
            # no previous utterance, so just add the new one
            self._chat_ctx.add_message(role=role, content=text_content)
            if len(self._chat_ctx.items) > MAX_MESSAGES:
                self._chat_ctx.truncate(max_items=MAX_MESSAGES)
            return

MatheusRDG avatar Jul 01 '25 19:07 MatheusRDG

Hi @MatheusRDG Thanks for trying out the plugin. Actually this is a bug that I noticed right after the v1.1.15 release. If you look at the latest code (https://github.com/livekit/agents/blob/main/livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/experimental/realtime/realtime_model.py#L664) and run from there, that should handle the issue.

Use something like uv pip install -e . to link the source code as a dependency to your working project.

BumaldaOverTheWater94 avatar Jul 01 '25 23:07 BumaldaOverTheWater94

one other option is to add some messages to the chat_ctx so you don't encounter the index OOB error

class Assistant(Agent):
    def __init__(self, tools: list[llm.FunctionTool | llm.RawFunctionTool]) -> None:
        chat_ctx = ChatContext.empty()
        chat_ctx.add_message(role="user", content="hey sonic, tell me a children's story")
        chat_ctx.add_message(role="assistant", content=story)

        super().__init__(
            instructions="You are a helpful voice AI assistant.",
            tools=tools,
            chat_ctx=chat_ctx,
        )

BumaldaOverTheWater94 avatar Jul 01 '25 23:07 BumaldaOverTheWater94

@BumaldaOverTheWater94 Both options are working for me. Really appreciate your work, thank you!

MatheusRDG avatar Jul 01 '25 23:07 MatheusRDG

Does Nova-sonic support function tooling? Does it support RunContext?

mridulrao avatar Jul 06 '25 19:07 mridulrao

Does Nova-sonic support function tooling? Does it support RunContext?

Yes, function tool calling is supported. See https://github.com/livekit/agents/pull/2817 for an example.

Unfortunately RunContext is not currently supported as the RealtimeSession does not have knowledge of the outside AgentSession to inject RunContext. Adding support for RunContext is WIP.

BumaldaOverTheWater94 avatar Jul 08 '25 19:07 BumaldaOverTheWater94

@mridulrao RunContext support has been added with v1.1.16

BumaldaOverTheWater94 avatar Jul 10 '25 18:07 BumaldaOverTheWater94

marking this issue as completed

BumaldaOverTheWater94 avatar Jul 22 '25 17:07 BumaldaOverTheWater94