spoken-command-recognition icon indicating copy to clipboard operation
spoken-command-recognition copied to clipboard

the Synthesized command dataset can work, or not?

Open awoniu opened this issue 4 years ago • 5 comments

Is this project finished successfully? is there any conclusion about using a synthesized dataset to train a model? I am thinking about do some similar experiment like this project and hope anybody can give some suggestion. Thx~

awoniu avatar Aug 26 '19 09:08 awoniu

Some projects have started using this data set for preliminary work, and you are more than welcome to do so as well (it is on Kaggle too). I myself do not have the expertise to develop elaborate RNNs etc., and am now focusing on other projects.

JohannesBuchner avatar Aug 26 '19 09:08 JohannesBuchner

Some projects have started using this data set for preliminary work, and you are more than welcome to do so as well (it is on Kaggle too). I myself do not have the expertise to develop elaborate RNNs etc., and am now focusing on other projects.

ok~. I have try to use a synthesized dataset( I make it by using a open source toolkit: soundtouch here is the toolkit's link: http://www.surina.net/soundtouch/ ) to train a RNN(GRU+DNN) model. here is a some preliminary result of my work : I got two command word audio( one is male and the other is female),and I change the pitch speed tempo, and add noise with different SNR level, and finally I got 3 thousands command words audio samples. after the model(GRU+DNN) training seems the model can easily recognize the synthesized command words, but cannot do well in audio from the true world.

awoniu avatar Aug 26 '19 09:08 awoniu

That is not overly surprising. Probably you want to use these synthetic data sets to extend real datasets. You can also try to increase the number of speakers, pronunciations and emphasis, as this project does.

JohannesBuchner avatar Aug 26 '19 10:08 JohannesBuchner