NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

Train Speaker Diarization model

Open AMITKESARI2000 opened this issue 3 years ago • 2 comments

Hello, Can you guide me how to train and fine tune the Speaker Diarization model taught in https://github.com/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb. I was unable to find any documentation on this Thanks

AMITKESARI2000 avatar Jul 09 '22 20:07 AMITKESARI2000

Hello, If you want to train speaker embedding model (not VAD), this guide will be helpful: https://github.com/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb Note the "Speaker Verification" part, it's important to change loss

ValeryNikiforov avatar Jul 12 '22 11:07 ValeryNikiforov

Hi, Thanks for the reply! I checked out the notebook. I wanted to train Speaker Diarization on my custom dataset. Not able to reproduce the steps as they are a bit confusing. Could you please chalk out a rough path for me as to how to train Speaker Diarization (with non oracle VAD), or better still give a example in colab. I think it would help others as well, if proper steps are there.

AMITKESARI2000 avatar Jul 18 '22 20:07 AMITKESARI2000

Offline Speaker Diarization in NeMo is currently clustering based, so to train SD you need to train VAD and Speaker Embedding extractor as explained here: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_diarization/datasets.html

For training MSDD models.. documentation will be added soon.

nithinraok avatar Sep 21 '22 04:09 nithinraok