NeMo
NeMo copied to clipboard
Speaker Diarization - Audio Length Limit
Hi,
Thanks for the prompt reply on helping me out with diarization with unknown speakers.
I just had a question - is there a limit on the length of audio file that can be processed for speaker diarization , or any speaker model in general with nemo.
If I have files with 1.5 hrs length, can they be effectively diarized?
Thanks!!
Speaker diarization can be done on an audio file of 90mins, but it could take few minutes to get the result since clustering algorithm is relying on eigenvalue decomposition (O(n^3) complexity)
accuracy wise, diarization accuracy is not necessarily dependent on the length of audio.
Divide and conquer feature for faster clustering is WIP.
Thanks so much for the reply..
While I was trying further, I got another similar question..
If a file is of very short duration 2-4 seconds, will the embeddings be extracted accurately.??
Is there a minimum limit of audio required for effective speaker embedding extraction?
Thanks!!
Technically you can extract speaker embedding with any length of audio. I would recommend a minimum speech length of 2 sec to get better representations of the speaker. For diarization purposes, you may use smaller window lengths as well if you use it with multi-scale embeddings extraction.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.