vadnet I have some problems with this project

I have some problems with this project

Open JunGenius opened this issue 5 years ago • 2 comments

Hello ,author! First of all, thank you very much for providing me with the ideas I realized.Then I have some questions:

I have noticed that the neural network makes a classification decision each 1 second of audio,but It is possible to include speech and noise in one second, such as 30% noise and 70 voice, how to distinguish them？
If a voice lasts for 1.2 seconds, the next 0.2 seconds of vocals may be classified as noise, resulting in incomplete speech segments, so how to solve this problem?
I want to reduce the classification time, such as 500ms or 250ms, then whether to separate the training speech and noise into a file size of 500ms or 250ms, and then retrain a new model, so will it lead to a decline in the recognition rate?

I am looking forward to your answer, thank you again.

Jun 25 '19 16:06 JunGenius

No, a decision is made per frame (e.g. second). But you can do two things: train your network on a shorter window size (see e.g. #7) and increase overlapping, e.g. make a prediction every 0.1 s, and apply some post-processing to the sequence of decisions afterwards.
Again, I suggest to increase overlapping between frames.
See #7

Jul 02 '19 06:07 frankenjoe

OK,Thank you very much.

Jul 17 '19 16:07 JunGenius