VAD-python
VAD-python copied to clipboard
VAD on singing voice?
I am trying to adapt this script to detect voice-silence segments in an audio file containing source separated singing voice signal obtained from http://github.com/sigsep/open-unmix-pytorch
I have some questions:
-
Does it make sense to compute the threshold for each
data_window
independently? Instead of having a fixedspeech_energy_threshold
? I would do that by computing the energy of the data_window signal, normalizing it and taking its mean value. If this value is = 0.0, I can label that segment as silence. -
Is there a clever way to choose parameters like
sample_window
,sample_overlap
,speech_window
that would be more appropriate for singing voice signals?
Thanks a lot!