cantonese-selfish-project icon indicating copy to clipboard operation
cantonese-selfish-project copied to clipboard

New MDCC Dataset

Open edwin0cheng opened this issue 2 years ago • 1 comments

Hi, I am Hong Konger and just find this project interesting, but I mainly work on open-source project which not related to AI.

But I had do some research for Cantonese subtitles generation, and I knew that the main obstacle are lack of good data-set.

However , do you have a chance to take a look of this new dataset (2021):

https://github.com/HLTCHKUST/cantonese-asr

I hadn't tried it but in there paper, they said it make a better result than CommonVoice, so maybe you will feel interested.

edwin0cheng avatar Oct 20 '22 12:10 edwin0cheng