NeMo Speaker Diarization - Audio Length Limit

Speaker Diarization - Audio Length Limit

Open sjsakshi opened this issue 2 years ago • 3 comments

Hi,

Thanks for the prompt reply on helping me out with diarization with unknown speakers.

I just had a question - is there a limit on the length of audio file that can be processed for speaker diarization , or any speaker model in general with nemo.

If I have files with 1.5 hrs length, can they be effectively diarized?

Thanks!!

Sep 02 '22 05:09 sjsakshi

Speaker diarization can be done on an audio file of 90mins, but it could take few minutes to get the result since clustering algorithm is relying on eigenvalue decomposition (O(n^3) complexity)

accuracy wise, diarization accuracy is not necessarily dependent on the length of audio.

Divide and conquer feature for faster clustering is WIP.

Sep 03 '22 21:09 tango4j

Thanks so much for the reply..

While I was trying further, I got another similar question..

If a file is of very short duration 2-4 seconds, will the embeddings be extracted accurately.??

Is there a minimum limit of audio required for effective speaker embedding extraction?

Thanks!!

Sep 16 '22 12:09 sjsakshi

Technically you can extract speaker embedding with any length of audio. I would recommend a minimum speech length of 2 sec to get better representations of the speaker. For diarization purposes, you may use smaller window lengths as well if you use it with multi-scale embeddings extraction.

Sep 21 '22 03:09 nithinraok

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Oct 22 '22 02:10 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

Oct 30 '22 02:10 github-actions[bot]

NeMo NeMo copied to clipboard

Speaker Diarization - Audio Length Limit

NeMo
NeMo copied to clipboard