Chinese-FastSpeech2 icon indicating copy to clipboard operation
Chinese-FastSpeech2 copied to clipboard

The bert details

Open mondorysix opened this issue 9 months ago • 0 comments

Thank you for sharing your work. I am truly impressed by your project and have developed a keen interest in understanding it more deeply. If it's convenient for you, I have a few questions that I'd like to ask. I noticed that you've used BERT for extracting prosodic features in your project. I've conducted some experiments on my own, but the BERT models I found on HuggingFace didn't yield results as good or as natural as yours. I've tried the WWM version and the large models, but neither seemed to work very well. This has been a point of confusion for me, and I was hoping you could help clarify. Is the BERT model you used trained from yourself or taken from Google? Is it the WWM version and did you do some modification? Also, have you fine-tuned it on datasets other than Chinese Wikipedia? I would greatly appreciate your insights on these matters.

mondorysix avatar May 15 '24 07:05 mondorysix