speech_signal_processing

Any question, you can pull a issue or email me.

Description

VAD.py is the first project.
MFCC_DTW.py is the second project.
GMM_UBM.py is the third project, and GUI.py is the GUI of this project.
d_vector.py is final project, and Final_GUI.py is the GUI of this project.
feature dir saved model and feature file. you can download it from here, code is iwmf.
All the reports are in report dir.

Requirement

python 3.x,windows

Any other package, run code below

pip install -r requirements.txt

pip install dtw librosa fastdtw tqdm sidekit tensorflow keras numpy scipy pyqt sklearn

NOTE:you can use mirror to speed up，refer blog

dataset

download dataset for d-vector from voxceleb.

Experiment Log

MFCC+DTW

DTW	Time(s)	Acc(%)
accelerated_dtw	92	83.72
accelerated_dtw+pre-emphasis	105	74.42
fastdtw	71	60.47
fastdtw+pre-emphasis	79	65.12

Summury:The results of fastdtw is bad than accelerated_dtw, so I suggest you to use accelerated rather than fastdtw if you prefer more on accuracy.

MFCC

blog

GMM

MFCC+GMM

Please read report for more details.

d-vector

we train our model on voxceleb dataset, more details, please read report.

model	time(s)	train_acc	valid_acc	epoch	test_acc	test_time
nn	56s	0.5321	0.4672	50	0.3682	11.92
lstm	2906	0.788	0.5472	100	0.4371	49.53
gru	2977	0.9385	0.7484	30	0.3766	70.05

inference:paper

inference_gru:paper

inference_lstm:paper

speech_signal_processing
speech_signal_processing copied to clipboard

Metadata

speech_signal_processing

Description

Requirement

dataset

Experiment Log

MFCC+DTW

MFCC

GMM

MFCC+GMM

d-vector

Reference

← Metadata

Owner

Metadata

speech_signal_processing speech_signal_processing copied to clipboard

Metadata

speech_signal_processing

Description

Requirement

dataset

Experiment Log

MFCC+DTW

MFCC

GMM

MFCC+GMM

d-vector

Reference

← Metadata

Owner

Metadata

speech_signal_processing
speech_signal_processing copied to clipboard