pipecat icon indicating copy to clipboard operation
pipecat copied to clipboard

When allow_interruption is set to False, assistant messages won't get appended to LLM context

Open JulianGerhard21 opened this issue 7 months ago • 7 comments

Is this reporting a bug or feature request? A Bug or at least unintended behaviour

Environment pipecat-ai version: 0.0.63 python version: 3.10 OS: Mac OS Issue description When allow_interruption is set to True, you'll get

LLM: Generating chat [[{"role": "system", "content": "...", {"role": "user", "content": "..."}, {"role": "user", "content": "..."}, ...]]

when using OpenAILLMContext in any case, even if the system clearly produced valid output for the TTS component - the assistant messages simply don't get appended which actually destroys the business logic.

Repro steps Set allow_interruption = True, use any OpenAIContext based Pipeline and you will see this happen every iteration.

Expected behavior The assistant messages get properly appended to the context, the LLM is using when generating a response.

Actual behavior Only the system message and consecutive user messages get appended to the context.

JulianGerhard21 avatar Apr 15 '25 03:04 JulianGerhard21

Would you mind sharing your example code ? I'm using OpenAILLMContext and the allow_interruptions option and I don't seem to get your problem

Here're some of my logs: 2025-04-15 10:16:30.400 | DEBUG | pipecat.services.google.llm:_process_context:507 - GoogleLLMService#15: Generating chat [[{'parts': [{'text': "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way."}], 'role': 'user'}, {'parts': [{'text': "Okay, I understand. I'm ready to assist you in this WebRTC call. Just let me know what you need. I can summarize information, answer questions, generate creative content, or whatever else you might find useful in our conversation. I'll keep my responses concise and easy to understand. How can I help you today?"}], 'role': 'model'}, {'parts': [{'text': 'Hey.'}], 'role': 'user'}, {'parts': [{'text': 'Hello there! What can I do for you?'}], 'role': 'model'}, {'parts': [{'text': "I'm sorry. What are you again?"}], 'role': 'user'}, {'parts': [{'text': 'No problem! I am a helpful AI, designed to assist you during this WebRTC call. Think of me as a super-powered assistant here to provide information, answer questions, or generate creative content as needed. Just let me know how I can be of service!'}], 'role': 'model'}, {'parts': [{'text': 'Okay. I remember'}], 'role': 'user'}, {'parts': [{'text': 'I remember that now. So are you Gemini?'}], 'role': 'user'}, {'parts': [{'text': "While I share technology with Gemini, I'm running independently in this environment to best assist you. What can I do for you today?"}], 'role': 'model'}, {'parts': [{'text': 'Answer my question? Are you Gemini?'}], 'role': 'user'}, {'parts': [{'text': 'I have been trained by Google and am a large language model. I am running independently from Gemini in this environment.'}], 'role': 'model'}, {'parts': [{'text': 'Okay. So are you there?'}], 'role': 'user'}, {'parts': [{'text': "Yes, I'm here and ready to assist you. How can I help you with the call?"}], 'role': 'model'}, {'parts': [{'text': 'Tell me a joke.'}], 'role': 'user'}, {'parts': [{'text': "Why don't scientists trust atoms? Because they make up everything!"}], 'role': 'model'}, {'parts': [{'text': 'Okay. Stop that.'}], 'role': 'user'}, {'parts': [{'text': 'Stop there.'}], 'role': 'user'}]]

ken-kuro avatar Apr 15 '25 10:04 ken-kuro

Of course, no problem. I am currently working with:

<import_statements>

logger.remove(0)
logger.add(sys.stderr, level="DEBUG")

twilio = Client(
    os.environ.get("TWILIO_ACCOUNT_SID"), os.environ.get("TWILIO_AUTH_TOKEN")
)

async def main(websocket_client, stream_sid, call_sid):
    transport = FastAPIWebsocketTransport(
        websocket=websocket_client,
        params=FastAPIWebsocketParams(
            audio_out_enabled=True,
            add_wav_header=False,
            vad_enabled=True,
            vad_analyzer=SileroVADAnalyzer(),
            vad_audio_passthrough=True,
            serializer=TwilioFrameSerializer(stream_sid),
        ),
    )

    tools = ToolsSchema(standard_tools=[end_twilio_call_schema, forward_twilio_call_schema, smo_plugin_call_schema])

    stt = GladiaSTTService(
        api_key=os.environ.get("GLADIA_API_KEY"),
        model="solaria-1",
        params=GladiaInputParams(
            language_config=LanguageConfig(
                languages=[Language.DE, Language.EN],
                code_switching=True
            ),
        )
    )

    llm = OpenAILLMService(
        name="LLM",
        api_key=os.environ.get("OPENAI_API_KEY"),
        model="gpt-4.1-mini-2025-04-14",
        params=OpenAILLMService.InputParams(
            temperature=0.2
        )
    )

    llm.register_function("end_twilio_call", end_twilio_call)
    llm.register_function("forward_twilio_call", forward_twilio_call)
    llm.register_function("smo_plugin_call", smo_plugin_call)

    tts = ElevenLabsTTSService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        voice_id="wcGcDDfRHvH6LR9p07u4"
    )

    messages = [
        {
            "role": "system",
            "content": """<customer_specific_system_prompt>""
        },
        {
            "role": "assistant",
            "content": "<initial_utterance>"
        }
    ]

    context = OpenAILLMContext(
        messages=messages,
        tools=tools
    )

    context_aggregator = llm.create_context_aggregator(context)

    pipeline = Pipeline(
        [
            transport.input(),
            stt,
            context_aggregator.user(),
            llm,
            tts,
            transport.output(),
            context_aggregator.assistant(),
        ]
    )

    task = PipelineTask(
        pipeline,
        params=PipelineParams(
            allow_interruptions=True,
            enable_metrics=True
        )
    )

    await task.queue_frame(TTSSpeakFrame("initial_utterance"))

    runner = PipelineRunner(handle_sigint=False)

    await runner.run(task)

Given allow_interruptions=True, the second iteration (after the LLM questioned the second time) I receive:

[
  [
    {
      "role": "system",
      "content": "<system_prompt>"
    },
    {
      "role": "assistant",
      "content": "<initial_utterance>"
    },
    {
      "role": "user",
      "content": " Ich w\u00fcrde ganz gerne einen Termin <entity> vereinbaren."
    },
    {
      "role": "assistant",
      "content": "Vielen Dank. M\u00f6chten Sie fortfahren?"
    },
    {
      "role": "user",
      "content": " Ja, ich m\u00f6chte fortfahren."
    }
  ]
]

so far so good.

Given allow_interruptions=False, the second iteration (after the LLM questioned the second time) I receive:

[
  [
    {
      "role": "system",
      "content": "<system_prompt>"
    },
    {
      "role": "assistant",
      "content": "<initial_utterance>"
    },
    {
      "role": "user",
      "content": " Ich w\u00fcrde ganz gerne einen Termin <entity> vereinbaren."
    },
    {
      "role": "user",
      "content": " Ja, m\u00f6chte ich."
    }
  ]
]

I think the difference is obvious. This behaviour is repeated consistently in both cases.

JulianGerhard21 avatar Apr 16 '25 07:04 JulianGerhard21

Oh maybe I mis-understand what you mean. I thought setting allow_interruption to True and use OpenAILLMContext make the LLM context wrong. In the example logs I sent above, you can see that there're some places where I got multiple user parts continuously, but that's not an error, it's just the VAD doing the work, since there's no bot response between those, so it's completely normal. But what you're showing seem like you only got that problem with allow_interruption set to False? And I wonder between those parts, is there any LLM part that is not recognized in the history or not

ken-kuro avatar Apr 16 '25 08:04 ken-kuro

In a standard case, I do not want to allow any interruptions, so I set the parameter to False. However, I now notice that as soon as I do this, the following ‘assistant’ messages no longer appear in the history sent to the LLM – so the context is incomplete.

JulianGerhard21 avatar Apr 16 '25 08:04 JulianGerhard21

I just noticed my mistake - the headline was misleading. I corrected this.

JulianGerhard21 avatar Apr 16 '25 08:04 JulianGerhard21

@JulianGerhard21 I can confirm what you're seeing, but I'm not sure what the intended behavior is. I'll have to ask @aconchillo.

Essentially 100% of Pipecat interactions run with interruptions enabled. If you want to prevent the bot from being interrupted, I'd recommend using something like the STTMuteFilter with a strategy set to ALWAYS.

Check out the docs and demo.

As for the issue, let's see what @aconchillo has to say. This is a relatively old part of Pipecat, before I started focusing on the project. He knows the history.

markbackman avatar Apr 17 '25 04:04 markbackman

I also noticed that with allow_interruptions=False, the conversation is not coherent because the agent seems not to remember what it said previously

pesterhazy avatar Apr 28 '25 10:04 pesterhazy