espnet Why do you use Interpolating in pitch feature?

Why do you use Interpolating in pitch feature?

Open amitaie opened this issue 2 years ago • 1 comments

Hey, didn't find any answer to this question. i saw few implementations of pitch feature ( with/out quantisation, with/out un-voice predictor) but all of them used the interpolation in some point and i don't understand what for. if anyone knows i would love to understand.

Sep 20 '22 13:09 amitaie

I looked at the implementation here and the FastPitch paper and i didn't understand how the use of interpolation won't ruin the pitch prediction in the case of unvoice in the audio. It seems that the FS2 with the FastPitch's pitch predictor doesn't care if the value of the target pitch was 0 before the interpolation although it is very impotent for the audio.

Sep 22 '22 13:09 amitaie

Actually, I’ve never strictly compared f0 with continuous f0 when using token-averaged case. There is an option to change f0 type, so you can try it.

https://github.com/espnet/espnet/blob/6a46986cab33e401aaf730f066aabfbf4c719090/espnet2/tts/feats_extract/dio.py#L47

If you report the effectiveness, that is great.

Sep 24 '22 02:09 kan-bayashi

i'm working on different repo so it will be hard for me to report, but I still can't understand isn't it a problem to use continuous f0? when does the model learn how to distinguish between unvoice and voice?

Sep 28 '22 12:09 amitaie

Since we use data-driven duration derived from attention, the unvoiced decision and unvoiced consonants correspondence is not perfect. Therefore, even if we exclude unvoiced part to calculate token-averaged f0 as in FastPitch, I think the value for unvoiced consonants will not be 0. And therefore, I think there is not significant difference between continuous f0 and not continuous f0. Instead of F0, phoneme representation may inform the unvoiced and voiced part difference.

Sep 28 '22 12:09 kan-bayashi

espnet espnet copied to clipboard

Why do you use Interpolating in pitch feature?

espnet
espnet copied to clipboard