PiENet
PiENet copied to clipboard
Pitch estimation network (PiENet) for noise-robust neural F0 estimation of speech signals
PiENet
Pitch estimation network (PiENet) for noise-robust neural F0 estimation of speech signals. The best performing model from [1] ('GTE-AUG') supplied as a pre-trained model. The model was trained using additional and convolutional noise augmentation, as well as vocoder-based ground truth enhancement (see publication for further details).
Article found in: IEEE Xplore, ResearchGate
Licence
Distributed under the Apache 2.0 license. See LICENSE for further details.
Dependencies
- Tensorflow (tested with version 1.10)
- Numpy
- Scipy
How to use pre-trained model:
The pre-trained model is trained for speech sampled at 16 kHz, with 10 ms frame shift and a F0 range of 50 to 500 Hz.
Method 1:
python generate.py: Perform pitch estimation to the.wavfiles specified in the text filetest_wavs.scpand outputs the.f0files to the default folder located at./f0/.
Method 2:
-
python generate.py input_file.wav: Perform pitch estimation toinput_file.wav, output to./f0/. -
python generate.py input_list.scp: Perform pitch estimation to all files specified in the text fileinput_list.scp, output to./f0/. -
python generate.py input_dir: Perform pitch estimation to all.wavfiles found in the folderinput_dir, output to./f0/.
Method 3:
python generate.py [input_file/input_list/input_dir] target_dir: Same options as Method 2, output totarget_dir.
How to train your own model:
-
Supply list of train data files in
train_wavs.scpfor input data, and the corresponding target F0s intrain_f0s.scp(datatype raw float32 for F0 files). Note that the number of frames in the F0 files must match the present method's framing convention (Nframes = ceil(wav_length/hop), signal zero-padded with winlen/2 samples from start and beginning). -
If you want to use additive noise augmentation with sampled noise from a database, supply the file paths in
noiselist.scpand edit the code intrain.pyline 54 tonoise_samples = augmentation.load_noise_samples(noise_wav_scp=noiselist.scp). -
For vocoder-based augmentation techniques, you need to download and the vocoder of your choice. The GlottDNN vocoder used in [1] can be found in https://github.com/ljuvela/GlottDNN. For ground truth enhancement, process the train wavs offline with the vocoder. The code for diversity augmentation is not provided in the current release of the method.
Reference:
[1] M. Airaksinen, L. Juvela, P. Alku and O. Räsänen: "Data augmentation strategies for neural F0 estimation", Proc. ICASSP 2019
Available: IEEE Xplore, ResearchGate