vocode-python [Bug]: Streaming conversation getting stuck if using InputStreamingSynthesizer

Brief Description

I am trying to use ElevenLabsWSSynthesizer in streaming conversation but facing "conversation stuck" issue. I understand from the code that its marked as "experimental" but I expected the following test to pass.

Can anyone help to me understand why this fails?

Any help greatly appreciated!

LLM

None

Transcription Services

None

Synthesis Services

None

Telephony Services

None

Conversation Type and Platform

This is a test environment for testing real-time streaming conversation using a dummy transcriber and dummy "input streaming" synthesizer.

Steps to Reproduce

Apply the attached patch file to get the new test
Run poetry run pytest -s -v tests/streaming/test_streaming_conversation.py::test_streaming_conversation_pipeline_streaming_synthesizer
Observe assertion error after the conversation times out.

Expected Behavior

The conversation should not timeout and the test should pass.

Screenshots

Following is the output after reproduction step:

tests/streaming/test_streaming_conversation.py::test_streaming_conversation_pipeline_streaming_synthesizer FAILED

========================================================================================== FAILURES ==========================================================================================
_________________________________________________________________ test_streaming_conversation_pipeline_streaming_synthesizer _________________________________________________________________

mocker = <pytest_mock.plugin.MockerFixture object at 0x75efd19189e0>

    @pytest.mark.asyncio
    async def test_streaming_conversation_pipeline_streaming_synthesizer(
        mocker: MockerFixture,
    ):
        """
        Test conversation pipeline using InputStreamingSynthesizer
        """
        output_device = DummyOutputDevice(sampling_rate=48000, audio_encoding=AudioEncoding.LINEAR16)
        streaming_conversation = StreamingConversation(
            output_device=output_device,
            transcriber=TestAsyncTranscriber(
                TestTranscriberConfig(
                    sampling_rate=48000,
                    audio_encoding=AudioEncoding.LINEAR16,
                    chunk_size=480,
                )
            ),
            agent=EchoAgent(
                EchoAgentConfig(initial_message=BaseMessage(text="Hi there")),
            ),
            synthesizer=TestSynthesizer(TestSynthesizerConfig.from_output_device(output_device)),
        )
        await streaming_conversation.start()
        await streaming_conversation.initial_message_tracker.wait()
        initial_message_audio_chunk = await output_device.dummy_playback_queue.get()
        assert initial_message_audio_chunk.data == b"Hi there"
    
        await asyncio.sleep(2)
    
        streaming_conversation.receive_audio(b"who are you?")
    
        first_response_audio_chunk = await output_device.dummy_playback_queue.get()
>       assert first_response_audio_chunk.data == b"who are you?"
E       AssertionError: assert b'Are you there?' == b'who are you?'
E         
E         At index 0 diff: b'A' != b'w'
E         
E         Full diff:
E         - (b'who are you?')
E         + (b'Are you there?')

tests/streaming/test_streaming_conversation.py:647: AssertionError

Jul 30 '24 09:07 ss14

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sep 29 '24 02:09 github-actions[bot]

This issue has been automatically closed due to inactivity. Thank you for your contributions.

Oct 07 '24 02:10 github-actions[bot]