StyleSpeech
StyleSpeech copied to clipboard
Why 16000hz sampling rate has been choosed for this research ?
I have tried to train a lot of model with sampling rate 22050 but, it can not reproduce quality of 16000 hz model. Can you explain why you use 16000 in your research ?
@KevinMIN95 help me please.
I chose a 16kHz of sampling rate to match with other baselines. I haven't tried with 22050 but I think it may be helpful to increase the capacity of the model.
to match with other baselines. I haven't tried with 22050 but I think it may be helpful to increase the capacity of the model.
Thanks, I will try. Maybe I will increase encoder dim from 256 to 384.
to match with other baselines. I haven't tried with 22050 but I think it may be helpful to increase the capacity of the model.
Thanks, I will try. Maybe I will increase encoder dim from 256 to 384.
Did you try this?
to match with other baselines. I haven't tried with 22050 but I think it may be helpful to increase the capacity of the model.
Thanks, I will try. Maybe I will increase encoder dim from 256 to 384.
Did you try this?
Yep, I try with 384 and 6 fftblock and it work.
@chazo1994 Great! Was the result better than 16kHz? Could you share the network and pretrained weights if possible?
@chazo1994 Great! Was the result better than 16kHz? Could you share the network and pretrained weights if possible? @MrGolden1 The results is slightly better than 16khz. I'm sorry, I cannot share the pretrained weights but the network is same as this repository, only hidden dimesion and number of fftblock are differents.
@chazo1994 Great! Was the result better than 16kHz? Could you share the network and pretrained weights if possible?
@MrGolden1 The results is slightly better than 16khz. I'm sorry, I cannot share the pretrained weights but the network is same as this repository, only hidden dimesion and number of fftblock are differents.