agents icon indicating copy to clipboard operation
agents copied to clipboard

examples for 1.0

Open tinalenguyen opened this issue 9 months ago • 8 comments

to be reviewed:

  • dentist scheduler: a multi-agent example offering different functionalities integrated via Cal.com and Supabase APIs
  • conversation persistor (realtime and pipeline): an updated version for 1.0 events
  • conversation recorder: example of grabbing input/output audioframes for a wav recording via stt and tts nodes

tinalenguyen avatar Feb 20 '25 09:02 tinalenguyen

⚠️ No Changeset found

Latest commit: 912c01e7fab91ddcf68f11bcb7ba5be91486fe33

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

changeset-bot[bot] avatar Feb 20 '25 09:02 changeset-bot[bot]

for ruff, you can automatically fix (most of) the imports using ruff check --fix

theomonnom avatar Mar 10 '25 08:03 theomonnom

pydub depends on audioop which was deprecated in Python 3.11 and removed in Python 3.13 There is a PR but not merged yet.

Bilal-io avatar May 06 '25 21:05 Bilal-io

@Bilal-io thanks for the heads up, i decided to opt for LK's audio resampler which uses SoX

tinalenguyen avatar May 12 '25 21:05 tinalenguyen

@tinalenguyen there is an issue since your last update. I tried to debug it but I am unable to find a solution. The final audio includes the participant's speech but not the agent's TTS. Are you seeing the same issue?

Bilal-io avatar May 14 '25 15:05 Bilal-io

@Bilal-io I can't seem to replicate that problem, are you using the same pipeline setup as the conversation_recorder.py example?

tinalenguyen avatar May 15 '25 06:05 tinalenguyen

Yes @tinalenguyen, I am using the same code you shared. Here is a gist I am able to converse with the agent without any issue. But as stated before the final audio file contains my speech without the TTS part, just silence.

Bilal-io avatar May 15 '25 14:05 Bilal-io

@Bilal-io Thank you for the gist, I was able to replicate the issue and fix it! Let me know if it works now :)

tinalenguyen avatar May 16 '25 04:05 tinalenguyen

Hey @tinalenguyen thank you for the quick fix. I am seeing two different issues: 1- First call works fine, second call causes an error ...return stt(self, record_audio(), model_settings)... But I fixed this by using the same pattern as the tts_recorder instead of returning the stt(self, record_audio(), model_settings) directly, I did this:

async for result in stt(self, record_audio(), model_settings):
            yield result

2- The audio of the agent sounds great when speaking but comes out choppy in the saved file. This is the case even without the change mentioned above. I've attached an audio sample (converted to mp4 to be able to attach here). Not sure if this is related to Livekit itself or your implementation.

I appreciate your input

https://github.com/user-attachments/assets/ba35bbd2-b63b-47ea-814a-d414290303e3

Bilal-io avatar May 21 '25 13:05 Bilal-io

@Bilal-io Good catch, that approach makes more sense!

As for the agent audio, I suspect it's from STT audio cutting in during the agent's speech and not mixing well. I've alleviated it by changing the quality of the resampler to very high:

self._audio_resampler = AudioResampler(input_rate=frame.sample_rate, output_rate=FRAMERATE, quality="very_high")

If the audio still isn't consistent, let me know and I'll look into crossfading/transitioning the audio streams. Thank you again for trying out my work!!

tinalenguyen avatar May 22 '25 05:05 tinalenguyen

Thank you @tinalenguyen for looking into this. The audio recording still contains the stuttering.

Also, the code you change you had requires invoking the record_audio in async for event in stt(self, record_audio, model_settings) so it should be async for event in stt(self, record_audio(), model_settings) instead.

Another issue I faced was with deleting the file due to a deadlock. I had to update the aclose to the following:

    async def aclose(self) -> None:
        self._audio_q.put_nowait(None)
        await self._main_atask
        if self._audio_resampler:
            frames = self._audio_resampler.flush()
            if frames:
                for flushed_frame in frames:
                    self._file.writeframes(flushed_frame.data.tobytes())
        self._audio_resampler = None
        self._current_input_rate = 0
        self._file.close()

Bilal-io avatar May 22 '25 21:05 Bilal-io

@Bilal-io Thank you for the feedback, I ended up rewriting most of it and I think it works way better now. Sorry about the bugs/delay, this must be a sign for me to stop coding at 4 AM..

Let me know what you think, and thanks again!!

tinalenguyen avatar Jun 03 '25 08:06 tinalenguyen