ShiLiang Zhang
ShiLiang Zhang
Hi, I have update the training scripts. Please switch to the github project in https://github.com/tramphero/kaldi.
Generally speaking, a single GPU is dozens of times faster than CPU. So I am afraid it will take months for you to train this model using CPUs.
As to th source code, please switch to the github project https://github.com/tramphero/kaldi.
Sorry, we are currently unable to maintain the UniASR series of models.
This likely pertains to the performance issue of the general Chinese speech recognition model when dealing with mixed Chinese-English speech. We will enhance the corresponding performance in subsequent model iterations.
The current finetuning pipeline of FunASR does not support directly modifying the subword vocabulary to add OOV vocabulary for finetuning. If there is a need for this, the following modifications...
支持吴语、闽南语、东北话、甘肃话、贵州话、河南话、湖北话、湖南话、宁夏话、山西话、陕西话、山东话、四川话、天津话的方言口音识别,但是由于训练数据量比较有限,具体效果需要结合自己的场景进行评估,看看是否满足要求。
> @@ means the token is subword. You could concat them via: replace('@@ ', '') Can we perhaps add the post-processing statements for handling subwords to the pipelines for all...