speech_signal_processing icon indicating copy to clipboard operation
speech_signal_processing copied to clipboard

speech_signal_processing

Any question, you can pull a issue or email me.

Description

  • VAD.py is the first project.

  • MFCC_DTW.py is the second project.

  • GMM_UBM.py is the third project, and GUI.py is the GUI of this project.

  • d_vector.py is final project, and Final_GUI.py is the GUI of this project.

  • feature dir saved model and feature file. you can download it from here, code is iwmf.

  • All the reports are in report dir.

Requirement

python 3.x,windows

Any other package, run code below

pip install -r requirements.txt

or

pip install dtw librosa fastdtw tqdm sidekit tensorflow keras numpy scipy pyqt sklearn

NOTE:you can use mirror to speed up,refer blog

dataset

download dataset for d-vector from voxceleb.

Experiment Log

MFCC+DTW

DTW Time(s) Acc(%)
accelerated_dtw 92 83.72
accelerated_dtw+pre-emphasis 105 74.42
fastdtw 71 60.47
fastdtw+pre-emphasis 79 65.12

Summury:The results of fastdtw is bad than accelerated_dtw, so I suggest you to use accelerated rather than fastdtw if you prefer more on accuracy.

MFCC

blog

GMM

blog

paper

scikit-learn

SIDEKIT

MFCC+GMM

Please read report for more details.

d-vector

we train our model on voxceleb dataset, more details, please read report.

model time(s) train_acc valid_acc epoch test_acc test_time
nn 56s 0.5321 0.4672 50 0.3682 11.92
lstm 2906 0.788 0.5472 100 0.4371 49.53
gru 2977 0.9385 0.7484 30 0.3766 70.05

inference:paper

inference_gru:paper

inference_lstm:paper

Reference

  1. audio-mnist-with-person-detection

  2. dVectorSpeakerRecognition

  3. speaker-verification

  4. voxceleb