icefall
icefall copied to clipboard
How to filter annotation(transcript) and choose suitable corpus for ASR
I plan to train an ASR model using own data with wenetspeech in egs. I want to know how the quality of annotation, good or bad, and different scene of corpus, such as conference conversation and read articles, influence my model. So, I can choose some beneficial corpus and delete those bad annotation that harm my model seriously. Of course, welcome other influence factors! Thanks!