cantonese-selfish-project
cantonese-selfish-project copied to clipboard
New MDCC Dataset
Hi, I am Hong Konger and just find this project interesting, but I mainly work on open-source project which not related to AI.
But I had do some research for Cantonese subtitles generation, and I knew that the main obstacle are lack of good data-set.
However , do you have a chance to take a look of this new dataset (2021):
https://github.com/HLTCHKUST/cantonese-asr
I hadn't tried it but in there paper, they said it make a better result than CommonVoice, so maybe you will feel interested.