pocketsphinx icon indicating copy to clipboard operation
pocketsphinx copied to clipboard

Bring back -remove_silence, maybe

Open dhdaines opened this issue 2 years ago • 1 comments

I maintain that silently removing frames from the input in sphinx_fe is a super bad idea, but it seems that all of the public models out there were trained with this option, and the various users of the library expect it to exist.

I do not have any desire to develop a new VAD solution in the PocketSphinx library, and in the end it seems that debugging -remove_silence is the path of least resistance.

If this happens then the batch mode API (i.e. ps_process_raw(full_utt=True)) will be fixed to return the correct word alignments.

dhdaines avatar Jul 21 '22 20:07 dhdaines

On the other hand a lot of users of the library seem to do their own VAD (e.g. the Python speech_recognition module) anyway. For batch mode -remove_silence is a terrible idea, one should do proper speaker diarization or use a real modern VAD.

Ideally we would retrain the models to not use this so they have proper silence models :(

dhdaines avatar Jul 22 '22 12:07 dhdaines

Nope. Won't do this, didn't do this. We have a separate endpointer for reasons detailed here: https://cmusphinx.github.io/2022/08/vad/

dhdaines avatar Sep 07 '22 23:09 dhdaines