welm icon indicating copy to clipboard operation
welm copied to clipboard

One command to build TLG.fst for WeNet.

WeLM

WeLM depens on:

  1. WeTextProcessing: Text Normalization
  2. jieba: Chinese Word Segmentation
  3. SentencePiece : English subword units Byte-Pair-Encoding (BPE)
  4. SRILM: N-Gram Language Model Training
  5. Some tools of Kaldi: TLG Graph Building

Usage

  1. Prepare the acoustic model's unit file units.txt.
$ wget https://path/to/aishell2/20210618_u2pp_conformer_libtorch.tar.gz
$ tar -xf 20210618_u2pp_conformer_libtorch.tar.gz
  1. Prepare transcript file trans.txt.
$ head -n1 trans.txt
BAC009S0002W0122 而对楼市成交抑制作用最大的限购
  1. Build the FST.
$ bash run.sh --unit_file units.txt --text trans.txt
$ ls exp/lang