diart
diart copied to clipboard
Implement voicefixer for audio enhancement
Is there any way to implement voicefixer to speaker diarization pipeline? The package takes a wav file as input and gives a upsampled 44100kHz wav file as output, but that could be easily modified to taking and giving audio numpy array. Since the speaker embeddings depend greatly on the quality of the input audio and in the real world environment, there are a lot of factor that can affect the quality of the audio such as the quality of the recording device, speaker voice change overtime,... so I think having some audio quality enhancement is a must.