Srikanth Ronanki
Srikanth Ronanki
As an alternative, try replacing `dtw_aligner_festvox.py` with `dtw_aligner.py` at [line number 60](https://github.com/CSTR-Edinburgh/merlin/blob/master/egs/voice_conversion/s1/03_align_src_with_target.sh#L60).
The current setup doesn't use lf0 stats for transformation of pitch. Therefore, you can ignore this error and proceed further. However, if your features are not extracted properly, you may...
Yeah, thank you for reminding that. Anyway, I'll remove this code -- as we're not using the stats for final F0 transformation.
Make sure the number of frames in each of lf0, bap and mgc are same. Use "x2x" in SPTK to find out the number of frames. ./tools/bin/SPTK-3.9/x2x +fa lf0/cmu_us_arctic_slt_text_01001.lf0 |...
This is the script you should use: https://github.com/CSTR-Edinburgh/merlin/blob/master/misc/scripts/vocoder/world/extract_features_for_merlin.sh Also set sampling frequency to either 16000Hz or 48000Hz w.r.t the data you are using, as the default value is 16000Hz: https://github.com/CSTR-Edinburgh/merlin/blob/master/misc/scripts/vocoder/world/extract_features_for_merlin.sh#L31...
If you are facing troubles with HTK, then try other alternative: - change the `Labels=state_align` to `Labels=phone_align` in `conf/global_settings.cfg` - run both steps of `02_prepare_labels.sh`
Is it possible to have input_shape=(None, input_dimension) and output_shape=(None, output_dimension) -- so that we can provide variable length input and get desired length as output? Is fixed length output compulsory?