omi icon indicating copy to clipboard operation
omi copied to clipboard

Speaker diarization sucks

Open kodjima33 opened this issue 1 year ago • 6 comments

i was speaking solo for 40 minutes and my voice was only determined in 40% of convos. Should be 100%

kodjima33 avatar Jan 07 '25 02:01 kodjima33

Image

kodjima33 avatar Jan 07 '25 03:01 kodjima33

the reason ?

  1. deepgram(DG) does not support speaker identification. our current implementation is a little bit tricky* and base on DG's diarization.
  2. DG's diarization is poor - at least with the current model nova-2-general. that's why you see multiple speakers 0-1

*: feeding DG with the user's recorded speech profile first, so that DG could labled speaker-0 is the user. this trick is the cause of missing Nikita for the first 30s right after connected.

solution ?

  1. Soniox is the only good enough platform support speaker identification. The easy way is to use Soniox.
  2. Try a DG's better model, and implement the better mechanism to solve "the first 30s issue" - the hard way.

my suggestion ? try 2. first. why? Soniox pricing is $0.4/h v/s DG $0.21/h (with our 1-1 match, the cost ~ $0.105/h)

beastoin avatar Jan 20 '25 15:01 beastoin

as of 20th January, diarization still bad

kodjima33 avatar Jan 21 '25 03:01 kodjima33

This issue is linked to our feedback platform. For feedback and updates, please visit this link.

Hitting the wall /

We are using nova-2-general - the best model for speaker diarization from DeepGram. There's no better model available for now.

Even if i fix the first 30s issue, the speaker diarization still won't be great because we rely heavily on DeepGram's speaker diarization. Besides that, the cost is high due to increasing the complexity on the code base which makes it harder for folks to contribute my draft PR

My thought rn: focus on what we can control by improving the input audio's quality.

Anyway let's cold it down a bit.

beastoin avatar Feb 02 '25 04:02 beastoin

@beastoin the audio from device comes as pretty good

There are alternatives such as bee that do it better => how?

kodjima33 avatar Feb 18 '25 02:02 kodjima33