examples for 1.0
to be reviewed:
- dentist scheduler: a multi-agent example offering different functionalities integrated via Cal.com and Supabase APIs
- conversation persistor (realtime and pipeline): an updated version for 1.0 events
- conversation recorder: example of grabbing input/output audioframes for a wav recording via stt and tts nodes
⚠️ No Changeset found
Latest commit: 912c01e7fab91ddcf68f11bcb7ba5be91486fe33
Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.
This PR includes no changesets
When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types
Click here to learn what changesets are, and how to add one.
Click here if you're a maintainer who wants to add a changeset to this PR
for ruff, you can automatically fix (most of) the imports using ruff check --fix
pydub depends on audioop which was deprecated in Python 3.11 and removed in Python 3.13 There is a PR but not merged yet.
@Bilal-io thanks for the heads up, i decided to opt for LK's audio resampler which uses SoX
@tinalenguyen there is an issue since your last update. I tried to debug it but I am unable to find a solution. The final audio includes the participant's speech but not the agent's TTS. Are you seeing the same issue?
@Bilal-io I can't seem to replicate that problem, are you using the same pipeline setup as the conversation_recorder.py example?
Yes @tinalenguyen, I am using the same code you shared. Here is a gist I am able to converse with the agent without any issue. But as stated before the final audio file contains my speech without the TTS part, just silence.
@Bilal-io Thank you for the gist, I was able to replicate the issue and fix it! Let me know if it works now :)
Hey @tinalenguyen thank you for the quick fix.
I am seeing two different issues:
1- First call works fine, second call causes an error ...return stt(self, record_audio(), model_settings)... But I fixed this by using the same pattern as the tts_recorder instead of returning the stt(self, record_audio(), model_settings) directly, I did this:
async for result in stt(self, record_audio(), model_settings):
yield result
2- The audio of the agent sounds great when speaking but comes out choppy in the saved file. This is the case even without the change mentioned above. I've attached an audio sample (converted to mp4 to be able to attach here). Not sure if this is related to Livekit itself or your implementation.
I appreciate your input
https://github.com/user-attachments/assets/ba35bbd2-b63b-47ea-814a-d414290303e3
@Bilal-io Good catch, that approach makes more sense!
As for the agent audio, I suspect it's from STT audio cutting in during the agent's speech and not mixing well. I've alleviated it by changing the quality of the resampler to very high:
self._audio_resampler = AudioResampler(input_rate=frame.sample_rate, output_rate=FRAMERATE, quality="very_high")
If the audio still isn't consistent, let me know and I'll look into crossfading/transitioning the audio streams. Thank you again for trying out my work!!
Thank you @tinalenguyen for looking into this. The audio recording still contains the stuttering.
Also, the code you change you had requires invoking the record_audio in async for event in stt(self, record_audio, model_settings) so it should be async for event in stt(self, record_audio(), model_settings) instead.
Another issue I faced was with deleting the file due to a deadlock. I had to update the aclose to the following:
async def aclose(self) -> None:
self._audio_q.put_nowait(None)
await self._main_atask
if self._audio_resampler:
frames = self._audio_resampler.flush()
if frames:
for flushed_frame in frames:
self._file.writeframes(flushed_frame.data.tobytes())
self._audio_resampler = None
self._current_input_rate = 0
self._file.close()
@Bilal-io Thank you for the feedback, I ended up rewriting most of it and I think it works way better now. Sorry about the bugs/delay, this must be a sign for me to stop coding at 4 AM..
Let me know what you think, and thanks again!!