welm
welm copied to clipboard
One command to build TLG.fst for WeNet.
WeLM
WeLM depens on:
- WeTextProcessing: Text Normalization
- jieba: Chinese Word Segmentation
- SentencePiece : English subword units Byte-Pair-Encoding (BPE)
- SRILM: N-Gram Language Model Training
- Some tools of Kaldi: TLG Graph Building
Usage
- Prepare the acoustic model's unit file
units.txt.
$ wget https://path/to/aishell2/20210618_u2pp_conformer_libtorch.tar.gz
$ tar -xf 20210618_u2pp_conformer_libtorch.tar.gz
- Prepare transcript file
trans.txt.
$ head -n1 trans.txt
BAC009S0002W0122 而对楼市成交抑制作用最大的限购
- Build the FST.
$ bash run.sh --unit_file units.txt --text trans.txt
$ ls exp/lang