Ryuichi Yamamoto
Ryuichi Yamamoto
https://github.com/r9y9/wavenet_vocoder/blob/8cc0c2dc28b2e7e0e6cafa02995b18be9e955df9/datasets/wavallin.py#L97-L100 If you use our preprocessing script, upsampling is expected to work correctly. I'm not really sure what you are hitting. You might want to try pdb or ipdb debugging...
In short: the difference is using teacher-forcing generation or not. - `dev_eval`: Results for development (validation) set. All waveform is generated by autoregressive generation (i.e. inference mode). - `train_no_dev_eval`: Results...
https://github.com/r9y9/wavenet_vocoder#4-synthesize-from-a-checkpoint > --length=: (Un-conditional WaveNet only) Number of time steps to generate. Well, that was intentional, though. Do you happen to need `--length` option for conditional synthesis? I'm ok with...
Hi, sorry for the super delay. If possible, could you describe in more details why the changes are needed?
Cython and NumPy should work on windows without significant effort IMO. That being said, I understand that there are some demands on C++ implementation for embedded applications.
In typical DNN-based parametric TTS systems, linear interpolation is used to interpolate F0s for unvoiced regions. There's some work using spline interpolation but I think linear interpolation is good enough...
Yes. I think you will need to annotate the accent information (rising/failing tones) as the HTS-style label.
こんにちは、 @attitudechunfeng ! > So does that mean i only need to replace the frontend part? Yes, you can reuse other parts. You can also reuse a part of frontend...
Alternatively, you could consider end-to-end approach, which doesn't require alignment as well as linguistic feature extraction (the hard part of the TTS!). See https://github.com/r9y9/deepvoice3_pytorch if you are interested.
I see. I hope you find something useful. Let me know if you find something should be improved.