Ryuichi Yamamoto

https://r9y9.github.io/ [email protected]

@line Nagoya, Japan Speech Synthesis, Voice Conversion, Machine Learning, Singing Voice Synthesis

Results 154 comments of


                                            Ryuichi Yamamoto

how to control upsample scales

https://github.com/r9y9/wavenet_vocoder/blob/8cc0c2dc28b2e7e0e6cafa02995b18be9e955df9/datasets/wavallin.py#L97-L100 If you use our preprocessing script, upsampling is expected to work correctly. I'm not really sure what you are hitting. You might want to try pdb or ipdb debugging...

Different Audio quality among intermediate results

In short: the difference is using teacher-forcing generation or not. - `dev_eval`: Results for development (validation) set. All waveform is generated by autoregressive generation (i.e. inference mode). - `train_no_dev_eval`: Results...

Synthesis Lengths

https://github.com/r9y9/wavenet_vocoder#4-synthesize-from-a-checkpoint > --length=: (Un-conditional WaveNet only) Number of time steps to generate. Well, that was intentional, though. Do you happen to need `--length` option for conditional synthesis? I'm ok with...

nnmnkwii/nnmnkwii/paramgen/_mlpg.py

Hi, sorry for the super delay. If possible, could you describe in more details why the changes are needed?

Planning to attempt porting it to native C++

Cython and NumPy should work on windows without significant effort IMO. That being said, I understand that there are some demands on C++ implementation for embedded applications.

Planning to attempt porting it to native C++

In typical DNN-based parametric TTS systems, linear interpolation is used to interpolate F0s for unvoiced regions. There's some work using spline interpolation but I think linear interpolation is good enough...

Document how to build speech synthesis system for new languages

Yes. I think you will need to annotate the accent information (rising/failing tones) as the HTS-style label.

Document how to build speech synthesis system for new languages

こんにちは、 @attitudechunfeng ! > So does that mean i only need to replace the frontend part? Yes, you can reuse other parts. You can also reuse a part of frontend...

Document how to build speech synthesis system for new languages

Alternatively, you could consider end-to-end approach, which doesn't require alignment as well as linguistic feature extraction (the hard part of the TTS!). See https://github.com/r9y9/deepvoice3_pytorch if you are interested.

Document how to build speech synthesis system for new languages

I see. I hope you find something useful. Let me know if you find something should be improved.

‹
1
2
3
4
5
6
7
8
9
10
...
15
16
›