StyleSpeech Why 16000hz sampling rate has been choosed for this research ?

Why 16000hz sampling rate has been choosed for this research ?

Open chazo1994 opened this issue 3 years ago • 3 comments

I have tried to train a lot of model with sampling rate 22050 but, it can not reproduce quality of 16000 hz model. Can you explain why you use 16000 in your research ?

Jun 07 '22 09:06 chazo1994

@KevinMIN95 help me please.

Aug 24 '22 04:08 chazo1994

I chose a 16kHz of sampling rate to match with other baselines. I haven't tried with 22050 but I think it may be helpful to increase the capacity of the model.

Aug 24 '22 11:08 KevinMIN95

to match with other baselines. I haven't tried with 22050 but I think it may be helpful to increase the capacity of the model.

Thanks, I will try. Maybe I will increase encoder dim from 256 to 384.

Aug 25 '22 04:08 chazo1994

to match with other baselines. I haven't tried with 22050 but I think it may be helpful to increase the capacity of the model.

Thanks, I will try. Maybe I will increase encoder dim from 256 to 384.

Did you try this?

Dec 03 '22 16:12 MrGolden1

to match with other baselines. I haven't tried with 22050 but I think it may be helpful to increase the capacity of the model.

Thanks, I will try. Maybe I will increase encoder dim from 256 to 384.

Did you try this?

Yep, I try with 384 and 6 fftblock and it work.

Dec 07 '22 03:12 chazo1994

@chazo1994 Great! Was the result better than 16kHz? Could you share the network and pretrained weights if possible?

Dec 07 '22 10:12 MrGolden1

@chazo1994 Great! Was the result better than 16kHz? Could you share the network and pretrained weights if possible? @MrGolden1 The results is slightly better than 16khz. I'm sorry, I cannot share the pretrained weights but the network is same as this repository, only hidden dimesion and number of fftblock are differents.

Dec 07 '22 10:12 chazo1994

@chazo1994 Great! Was the result better than 16kHz? Could you share the network and pretrained weights if possible?

@MrGolden1 The results is slightly better than 16khz. I'm sorry, I cannot share the pretrained weights but the network is same as this repository, only hidden dimesion and number of fftblock are differents.

Dec 07 '22 10:12 chazo1994

StyleSpeech StyleSpeech copied to clipboard

Why 16000hz sampling rate has been choosed for this research ?

StyleSpeech
StyleSpeech copied to clipboard