omi icon indicating copy to clipboard operation
omi copied to clipboard

issues with STT

Open kodjima33 opened this issue 9 months ago • 3 comments

have spent last 2 days building an app for omi. our stt system is 2/10 reliable. needs to be 9/10

specifically:

  • [ ] once every few minutes, data stops being sent to omi app (plugin). Im using real-time transcription option.
  • [ ] speech recognition is poor. only 80% of stuff is being recognized correctly (dev kit 2)
  • [ ] once every few minutes, there is a 10-30 seconds delay of sending data
  • [ ] randomly reconnects sometimes and for few minutes, no data is being recorded.

To finish this task, i need to see a test when something was recorded for 10h and 95% of it was recorded correctly

/bounty $2000

kodjima33 avatar Mar 03 '25 01:03 kodjima33

💎 $2,000 bounty • omi

Steps to solve:

  1. Start working: Comment /attempt #1941 with your implementation plan
  2. Submit work: Create a pull request including /claim #1941 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to BasedHardware/omi!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @nishantkluhera Mar 10, 2025, 5:46:37 AM WIP

algora-pbc[bot] avatar Mar 03 '25 01:03 algora-pbc[bot]

F.Y.I. As I understand, Cloud Run is not suitable as a backend for long-term connections like WebSockets due to its aggressive scaling-down policy and short grace period (only 10 seconds). A WebSocket connection from the app to the Cloud Run backend can be dropped if Cloud Run scales down an in-use backend instance, which may explain the random disconnection and reconnection issues. The more frequently Cloud Run instances are scaled up and down, the more unstable the connections between the app and the backend become.

The temporary workaround for this issue is to set a high minimum instance count to keep the number of instances stable. However, as a long-term solution, I would suggest to migrate backend services to GKE for better control of scaling behaviors that suits for websocket characteristic.

Speaking of poor speech recognition, I agree with this point. Nova-2-general model of Deepgram does not perform well in speech recognition. We can try new Nova-3 model which has a lot of improvement as they claim.

thainguyensunya avatar Mar 05 '25 10:03 thainguyensunya

/attempt #1941

Options

nishantkluhera avatar Mar 10 '25 05:03 nishantkluhera