pyannote-audio icon indicating copy to clipboard operation
pyannote-audio copied to clipboard

Using Noise-reduce before infernce increases DER

Open AMITKESARI2000 opened this issue 3 years ago • 2 comments

Hi, Thanks for the repo. I was trying to remove noise and background clutter from audio before putting it into the Speaker Diarization pipeline so that VAD can easily get speaker turns. But i think due to the difference in rttm ground truth file an hypothesis generated different timestamps, the DER seems to increase. Any idea why this happens and how to go about removing noise? If not this, any pre processing step to improve DER? image

AMITKESARI2000 avatar Jul 31 '22 20:07 AMITKESARI2000

Any idea why this happens and how to go about removing noise?

It most likely happens because of a mismatch between (original) training data and (denoised) test data. Denoising will inevitably generate artifacts that have never been seen during training -- resulting in a larger mismatch between training and test data.

If not this, any pre processing step to improve DER?

Any pre-processing will most likely have the same issue. I usually recommend to fine-tune the model to the target domain instead.

hbredin avatar Aug 09 '22 12:08 hbredin

Thanks for the clarification! Also Fine tuning the model would mean fine tune all the three models seperately right and then using in the pipeline? Or is there some way to fine tune on my dataset the entire pipeline in one go?

AMITKESARI2000 avatar Aug 10 '22 17:08 AMITKESARI2000

Closing as I think the original question has been answered. Please open a new issue for other questions (but please read existing open & closed issues beforehand, and also read the FAQ section of the main README)

hbredin avatar Aug 12 '22 07:08 hbredin