agents icon indicating copy to clipboard operation
agents copied to clipboard

fix: fix speech_buffer missing data in VADStream

Open longcw opened this issue 1 year ago • 5 comments
trafficstars

Originally speech_buffer got only a subset of the audio data for each audio frame from mic bc of

to_copy_buffer = min(self._model.window_size_samples, available_space)

This makes the subsequent non-stream STT doesn't work (e.g. openai.STT). Fix it by copying all data to the buffer.

longcw avatar Oct 11 '24 10:10 longcw

⚠️ No Changeset found

Latest commit: fe3ff6db9492106fc64c5ce2a250b18d535bc6d4

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

changeset-bot[bot] avatar Oct 11 '24 10:10 changeset-bot[bot]

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Oct 11 '24 10:10 CLAassistant

I've spent a couple of days hoping to solve this problem without any success. Your EXCELLENT job has helped me a lot, thanks!

zhanghx0905 avatar Oct 11 '24 11:10 zhanghx0905

I've spent a couple of days hoping to solve this problem without any success. Your EXCELLENT job has helped me a lot, thanks!

Yeah it's a weird behavior in the original code that copy only a part of the audio data to the buffer. Finally it works all good on my end after a few hours of debugging. Post it here also want to see if it's a bug or something intended. Glad it helped you as well :)

longcw avatar Oct 11 '24 16:10 longcw

thank you for the fix, we'll review and get this merged.

davidzhao avatar Oct 11 '24 16:10 davidzhao

Thanks, I needed some changes so I created another PR https://github.com/livekit/agents/pull/898

theomonnom avatar Oct 11 '24 23:10 theomonnom