agents icon indicating copy to clipboard operation
agents copied to clipboard

livekit-plugin-elevenlabs TTS stalls when selecting a custom voice.

Open Don-Chad opened this issue 4 months ago • 5 comments

Having a really hard time getting the Elevenlabs voice TTS to work with a dedicated/cloned voice. I'm using the voice-pipeline-agent-python example. It does work with the standard openai.TTS voice.

Elevenlabs.TTS with the standard voice, just set with the API key and model, also works:


assistant = VoicePipelineAgent(
        vad=ctx.proc.userdata["vad"],
        stt=openai.STT(),
        llm=openai.LLM(model="Meta-Llama-3.1-8B-Instruct", api_key="xxxx"),

        tts=elevenlabs.TTS(model_id="eleven_turbo_v2_5", api_key=os.getenv("ELEVEN_API_KEY")),

        chat_ctx=initial_ctx,
    )

Then if I try to work with a special cloned voice, using (voice=(Voice(id="....") the client never initializes audio (waiting for the Elevenlabs audio) and times out. there are no error messages it seems.

Help much appreciated!

code:

import logging
import os

from dotenv import load_dotenv
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    JobProcess,
    WorkerOptions,
    cli,
    llm,
)

from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import openai, deepgram, silero, elevenlabs
from livekit.plugins.elevenlabs import TTS, Voice, VoiceSettings


load_dotenv(dotenv_path=".env.local")
logger = logging.getLogger("voice-agent")


def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()


async def entrypoint(ctx: JobContext):
    initial_ctx = llm.ChatContext().append(
        role="system",
        text=(
            """You are a helpful assistant """
        ),
    )

    logger.info(f"connecting to room {ctx.room.name}")
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    # Wait for the first participant to connect
    participant = await ctx.wait_for_participant()
    logger.info(f"starting voice assistant for participant {participant.identity}")


    assistant = VoicePipelineAgent(
        vad=ctx.proc.userdata["vad"],
        stt=openai.STT(),
         llm=openai.LLM(model="Meta-Llama-3.1-8B-Instruct", api_key="xxxx"),

        tts = elevenlabs.TTS(
            voice=Voice(id="xxx", settings=VoiceSettings(stability=0.71, similarity_boost=0.5)),
            model="eleven_turbo_v2_5",
            api_key=os.getenv("ELEVEN_API_KEY")
            ),
        chat_ctx=initial_ctx,
    )

    assistant.start(ctx.room, participant)

    # The agent should be polite and greet the user when it joins :)
    await assistant.say("hi i am an assistant..", allow_interruptions=True)


if __name__ == "__main__":
    cli.run_app(
        WorkerOptions(
            entrypoint_fnc=entrypoint,
            prewarm_fnc=prewarm,
        ),
    )

Don-Chad avatar Oct 16 '24 22:10 Don-Chad