GAN_Harmonized_with_HMMs
GAN_Harmonized_with_HMMs copied to clipboard
Code:Completely Unsupervised Speech Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Models
GAN_Harmonized_with_HMMs
This is the implementation of our paper. In this paper, we proposed an unsupervised speech (phoneme) recogntion system which can achieve 33.1% phoneme error rate on TIMIT. This method developed a GAN-based model to achieve unsupervised phoneme recognition and we further use a set of HMMs to work in harmony with the GAN.
How to use
Dependencies
-
tensorflow 1.13
-
kaldi
-
srilm (can be built with kaldi/tools/install_srilm.sh)
-
librosa
Data preprocess
- Usage:
- Modify
path.shwith your path of Kaldi and srilm. - Modify
config.shwith your code path and timit path. - Run
$ bash preprocess.sh
-
This script will extract features and split dataset into train/test set.
-
The data which WFST-decoder needed also generate from here.
Train model
- Usage:
- Modify the experimental setting in
config.sh. - Modify the GAN-based model's parameter in
src/GAN-based-model/config.yaml. - Run
$ bash run.sh
-
This scipt contains the training flow for GAN-based model and HMM model.
-
GAN-based model generated the transcription for training HMM model.
-
HMM model refined the phoneme boundaries for training GAN-based model.
Note
- Training process with boundaries generated by GAS (bnd_type=uns) is unstable, which need more training attempts to achieve the satisfactory performance.
Hyperparameters in config.sh
bnd_type : type of initial phoneme boundaries (orc/uns).
setting : matched and nonmatched case in our paper (match/nonmatch).
jobs : number of jobs in parallel (depends on your decive).
Reference
Completely Unsupervised Speech Recognition By A Generative AdversarialNetwork Harmonized With Iteratively Refined Hidden Markov Models, Kuan-Yu Chen, Che-Ping Tsai et.al.
Links
- The WFST decoder for phoneme classifier1 .
- The training scripts for Unsupervised HMM 1 .
Acknowledgement
Special thanks to Che-Ping Tsai (jackyyy0228) for kaldi parts! Special thanks to Sung-Feng Huang (b02901071) for pytorch version!