ZDisket

Results 12 comments of ZDisket

@jinfagang Can you show the exact code you used to export?

Happens to me too, but I removed the 256 hop_size limitation and made it work with 48KHz.

@ming024 Here's an [audio sample](https://u.smutty.horse/lwdzjkcckrz.wav). Now, I was exaggerating a bit when saying performs excellently, but it's good compared to Tacotron2, which yielded a completely unusable model. I always upsample...

@ming024 There's something about your implementation specifically that makes it perform excellently on my hard and small datasets, I tried another one and the models are also unusable.

@ming024 No, the other implementation also uses durations from MFA-extracted phonemes which I implemented using yours as a reference, except that my equivalent of https://github.com/ming024/FastSpeech2/blob/e0a28e04db6631a4f9303a898b690ebf1ebea7fe/utils.py#L40 I used round() instead of...

@ming024 In MFA implementation I also have some code that corrects mismatches between durations and mel lengths. The durations calculated via int() mismatch about 20 to 50 frames each while...

@ming024 I really mean frames, and as to why it happens, I don't know as I didn't really explore that repo's preprocessing step.

@ming024 I have one: https://github.com/TensorSpeech/TensorflowTTS/issues/107#issuecomment-656447235

@kan-bayashi When you trained PWGAN up to 50k with pretrained, when do you turn on the discriminator? From how many steps, or from the start? When I finetune female voice...

@kan-bayashi VCTK has some male speakers, can we finetune single speaker male on multi speaker?