indonesian-asr icon indicating copy to clipboard operation
indonesian-asr copied to clipboard

Automatic speech recognition (ASR) for Indonesian language built by using HTK and Julius. Web interface is built using Node.js.

Indonesian Automatic Speech Recognition

cd training

  1. Create MFCC files

python dataset/train

  1. Create Monophone Model
  2. Prototype
`python files` (SEKIP AJA GAPAPA SIH)
  1. Wordlist
`HDMan -m -w wordlist/wlist -n monophones1 -l dlog dict wordlist/indonesian.lex`

Edit "dict" by adding
SENT-END    sil
silence     sil

at the correct position (remain sorted)

Jalankan `python`

Create following edit script "mkphones0.led" containing:
IS sil sil
DE sp

Window: `HLEd -l * -d wordlist/dict -i wordlist/phones0.mlf wordlist/mkphones0.led wordlist/words_sanitize.mlf` (INI IYA)

Ubuntu: `HLEd -l '*' -d wordlist/dict -i wordlist/phones0.mlf wordlist/mkphones0.led wordlist/words_sanitize.mlf` (INI IYA)

(disini bakal kobam dan ketahuan mana aja kata-kata yang ga ada di wlist dan dict, jadi harus crosscheck sampe bener)
(ERROR [+6550]  LoadHTKLabels: Junk at end of HTK transcription -> jangan lupa hapus spasi doang 1 line, hapus dengan regex `^\n` )
(ERROR [+6550]  LoadHTKList: Label Name Expected -> ini karena ada yang angka)
(ERROR [+1232]  NumParts: Cannot find word %s in dictionary -> mangat nguli)

beres semua error diatas,
buat ngebersihin si mlf dan scp dari suara yang samsek ga ada di dict
bakal ngeluarin scp dan mlf yang _sanitize
  1. HMM0-Init
`mkdir hmm0`

`HCompV -C config/conf-train -f 0.01 -m -S files/train_sanitize.scp -M hmm0 files/proto.hmm`

Create monophones0 dengan menggunakan monophones1 tanpa menggunakan entri 'sp'
Lalu bikin file hmmdefs dan macros
  1. Create Model

    1. HMM-1

    mkdir hmm1

    HERest -A -D -T 1 -C config/conf-train -I wordlist/phones0.mlf -t 250.0 150.0 1000.0 -S files/train_sanitize.scp -H hmm0/macros -H hmm0/hmmdefs -M hmm1 wordlist/monophones0

    1. HMM-2

    mkdir hmm2

    HERest -A -D -T 1 -C config/conf-train -I wordlist/phones0.mlf -t 250.0 150.0 1000.0 -S files/train_sanitize.scp -H hmm1/macros -H hmm1/hmmdefs -M hmm2 wordlist/monophones0

    1. HMM-3

    mkdir hmm3

    HERest -A -D -T 1 -C config/conf-train -I wordlist/phones0.mlf -t 250.0 150.0 1000.0 -S files/train_sanitize.scp -H hmm2/macros -H hmm2/hmmdefs -M hmm3 wordlist/monophones0

    1. HMM-4

    mkdir hmm4

    HERest -A -D -T 1 -C config/conf-train -I wordlist/phones0.mlf -t 250.0 150.0 1000.0 -S files/train_sanitize.scp -H hmm3/macros -H hmm3/hmmdefs -M hmm4 wordlist/monophones0

    1. HMM-5

    mkdir hmm5

    HERest -A -D -T 1 -C config/conf-train -I wordlist/phones0.mlf -t 250.0 150.0 1000.0 -S files/train_sanitize.scp -H hmm4/macros -H hmm4/hmmdefs -M hmm5 wordlist/monophones0

    1. HMM-6

    mkdir hmm6

    HERest -A -D -T 1 -C config/conf-train -I wordlist/phones0.mlf -t 250.0 150.0 1000.0 -S files/train_sanitize.scp -H hmm5/macros -H hmm5/hmmdefs -M hmm6 wordlist/monophones0

    1. HMM-7

    mkdir hmm7

    HERest -A -D -T 1 -C config/conf-train -I wordlist/phones0.mlf -t 250.0 150.0 1000.0 -S files/train_sanitize.scp -H hmm6/macros -H hmm6/hmmdefs -M hmm7 wordlist/monophones0

5. Silence

   1. Copy directory hmm7 to hmm8
      `xcopy hmm7 hmm8`
   2. Copy and paste the “sil” model and rename the new one “sp”(don't delete your old "sil" model, you will need it - just make a copy of it)
      Remove state 2 and 4 from new “sp” model (i.e. keep 'centre state' of old “sil” model in new “sp” model)
      change <NUMSTATES> to 3
      change <STATE> to 2
      change <TRANSP> to 3
      change matrix in <TRANSP> to 3 by 3 array
      change numbers in matrix as follows:
       `0.0 1.0 0.0
       0.0 0.9 0.1
       0.0 0.0 0.0`
   2. HHed
       Create the "sil.hed" script containing:
        `AT 2 4 0.2 {sil.transP}
        AT 4 2 0.2 {sil.transP}
        AT 1 3 0.3 {sp.transP}
        TI silst {sil.state[3],sp.state[2]}`
      `mkdir hmm9`
      `HHEd -H hmm8/macros -H hmm8/hmmdefs -M hmm9 wordlist/sil.hed wordlist/monophones1`
   3. HRest 2x
      Windows: `HLEd -l * -d wordlist/dict -i wordlist/phones1.mlf wordlist/mkphones1.led wordlist/words_sanitize.mlf`
      Ubuntu: `HLEd -l '*' -d wordlist/dict -i wordlist/phones1.mlf wordlist/mkphones1.led wordlist/words_sanitize.mlf`
      `mkdir hmm10`
      `HERest -A -D -T 1 -C config/conf-train  -I wordlist/phones1.mlf -t 250.0 150.0 3000.0 -S files/train_sanitize.scp -H hmm9/macros -H  hmm9/hmmdefs -M hmm10 wordlist/monophones1`
      `mkdir hmm11`
      `HERest -A -D -T 1 -C config/conf-train  -I wordlist/phones1.mlf -t 250.0 150.0 3000.0 -S files/train_sanitize.scp -H hmm10/macros -H  hmm10/hmmdefs -M hmm11 wordlist/monophones1`
6. Realigning the data
   1. HVite
      Bagian ini kata voxforge `1000.0` tapi kalau segitu maupun `3000.0` ada label yang hilang, jadi aku ubah `5000.0`.
      Windows: `HVite -A -D -T 1 -l * -o SWT -b SENT-END -C config/conf-train -H hmm11/macros -H hmm11/hmmdefs -i wordlist/aligned.mlf -m -t 250.0 150.0 5000.0 -y lab -a -I wordlist/words_sanitize.mlf -S files/train_sanitize.scp wordlist/dict wordlist/monophones1> HVite_log`
      Ubuntu: `HVite -A -D -T 1 -l '*' -o SWT -b SENT-END -C config/conf-train -H hmm11/macros -H hmm11/hmmdefs -i wordlist/aligned.mlf -m -t 250.0 150.0 5000.0 -y lab -a -I wordlist/words_sanitize.mlf -S files/train_sanitize.scp wordlist/dict wordlist/monophones1> HVite_log`

   2. HRest 999999x :'(
      `mkdir hmm12`
      `HERest -A -D -T 1 -C config/conf-train  -I wordlist/aligned.mlf -t 250.0 150.0 3000.0 -S files/train_sanitize.scp -H hmm11/macros -H  hmm11/hmmdefs -M hmm12 wordlist/monophones1`
      `mkdir hmm13`
      `HERest -A -D -T 1 -C config/conf-train  -I wordlist/aligned.mlf -t 250.0 150.0 3000.0 -S files/train_sanitize.scp -H hmm12/macros -H  hmm12/hmmdefs -M hmm13 wordlist/monophones1`
      `mkdir hmm14`
      `HERest -A -D -T 1 -C config/conf-train  -I wordlist/aligned.mlf -t 250.0 150.0 3000.0 -S files/train_sanitize.scp -H hmm13/macros -H  hmm13/hmmdefs -M hmm14 wordlist/monophones1`
      `mkdir hmm15`
      `HERest -A -D -T 1 -C config/conf-train  -I wordlist/aligned.mlf -t 250.0 150.0 3000.0 -S files/train_sanitize.scp -H hmm14/macros -H  hmm14/hmmdefs -M hmm15 wordlist/monophones1`
      `mkdir hmm16`
      `HERest -A -D -T 1 -C config/conf-train  -I wordlist/aligned.mlf -t 250.0 150.0 3000.0 -S files/train_sanitize.scp -H hmm15/macros -H  hmm15/hmmdefs -M hmm16 wordlist/monophones1`
      `mkdir hmm17`
      `HERest -A -D -T 1 -C config/conf-train  -I wordlist/aligned.mlf -t 250.0 150.0 3000.0 -S files/train_sanitize.scp -H hmm16/macros -H  hmm16/hmmdefs -M hmm17 wordlist/monophones1`
      `mkdir hmm18`
      `HERest -A -D -T 1 -C config/conf-train  -I wordlist/aligned.mlf -t 250.0 150.0 3000.0 -S files/train_sanitize.scp -H hmm17/macros -H  hmm17/hmmdefs -M hmm18 wordlist/monophones1`

cd ..

mkdir decoder

cd decoder (indonesian-asr/decoder)

7. Recognizer evaluation

    # Tanpa LM GRAM

    HBuild wordlist/wlist_sanitize result/network.w

    HVite -H hmm18/macros -H hmm18/hmmdefs -S files/train_sanitize.scp -l '*' -i result/recout_nogram.mlf -p 0.0 -s 5.0 -w result/network.w wordlist/dict wordlist/monophones1

    HResults -I wordlist/words_sanitize.mlf wordlist/monophones1 result/recout_nogram.mlf

    # Dengan LM GRAM

    HBuild wordlist/wlist_sanitize result/network.w -n lm/

    HVite -H hmm18/macros -H hmm18/hmmdefs -S files/train_sanitize.scp -l '*' -i result/recout_gram.mlf -p 0.0 -s 5.0 -w result/network.w wordlist/dict wordlist/monophones1

    HResults -I wordlist/words_sanitize.mlf wordlist/monophones1 result/recout_gram.mlf

8. Decoder

   1. Install Julius (
   2. Download SLRIM (
   3. Buat LM:
   4. Buat AM: `mkbinhmm -htkconf ../training/config/conf-train ../training/hmm12/hmmdefs`

    Install HDecode: `nmake /f htk_hdecode_nt.mkf all`
    `mkdir result`
    `HDecode -A -D -T 1 -H hmm13/macros -H hmm13/hmmdefs -C config/conf-test -S files/test_sanitize.scp -l * -i result/recout.mlf -w lm/ -p 0.0 -s 5.0 wordlist/dict wordlist/monophones1`
    Windows: `HVite -H hmm13/macros -H hmm13/hmmdefs -S files/test_sanitize.scp -l * -i result/recout.mlf -w ../lm/ -p 0.0 -s 5.0 wordlist/dict wordlist/monophones1`
    8. STEP 9
    Windows: HLEd -A -D -T 1 -n wordlist/mktri.led -l * -i wordlist/wintri.mlf wordlist/mktri.led wordlist/aligned.mlf
    HHEd -A -D -T 1 -H hmm18/macros -H hmm18/hmmdefs -M hmm19 wordlistmktri.hed wordlist/monophones1 


Make sure your computer have node js

Run the program

  1. Make sure you are in web folder using cd web
  2. Open node.js terminal (in Windows) or normal terminal (in Linux)
  3. Type node.js
  4. Open your browser and type localhost:8800
  5. Make sure you allow microphone in the browser
  6. Click button Start Recording for record and Stop Recording for stop and save the file in web/demo.wav
  7. Play the sound demo.wav with your application


  1. npm install binaryjs
  2. npm install express
  3. npm install fs
  4. npm install jade
  5. npm install wav
  6. npm install recordrtc
  7. npm install child_process


run using node cobaexec.js for changing the output, just take the stdout variable inside the function