pocketsphinx-python
pocketsphinx-python copied to clipboard
Python vs. pocketsphinx_continuous/_batch - same config, different results
Hello all,
I'm playing with Pocketsphinx for few days and was curious about the differences in behavior of the Python library vs. the available executables (pocketsphinx_continuous, pocketsphinx_batch).
I have enabled the Verbose flag for the Python version and adapted the 3 fields that were different from the logs I got from the mentioned executables (vad_threshold
, kws_threshold
, allphone_ci
). My expectations were that the outputs of my python code below will match to one of the outputs generated by the bash scripts I call the executables from, but that doesn't happen.
Could you please give me some hints what else is different, what is the reason of these differences? The audio files used for all the programs are the same and all are mono, 16kHz 16-bit signed little-endian.
(Switching the ps.decode()
arguments: no_search = True
has no effect on the output, full_utt = True
then doesn't produce any output at all. Where can I find what exactly do these two flags mean?)
Below I'm attaching the codes and files with the corresponding transcription outputs and configuration logs.
Python code (corresponding attachments: python_output_tuned.hyp.txt, python_tuned.log.txt):
import os
from os import path, listdir
from pocketsphinx import Pocketsphinx, get_model_path
import sox
model_path = get_model_path()
config = {
# using the default values - see https://pypi.org/project/pocketsphinx/
'hmm': os.path.join(model_path, 'en-us'),
'lm': os.path.join(model_path, 'en-us.lm.bin'),
'dict': os.path.join(model_path, 'cmudict-en-us.dict'),
'sampling_rate': 16000,
'verbose': True,
# with following configs, the settings should exactly match what we can reach with the wrapped scripts
'vad_threshold': 2.0,
'kws_threshold': 1.0,
'allphone_ci': False
}
ps = Pocketsphinx(**config)
# path to the directory where the .wav's are stored
directory = "../my_records/jindra/converted"
out_hyp_file_path = "./python_output_github.hyp"
out_hyp_file = open(out_hyp_file_path, "w")
file_list = os.listdir(directory)
# sort the list by alphabet (default order is "arbitrary") to obtain outputs diff-able with outputs of pocketsphinx_batch
file_list.sort()
for entry in file_list:
entry_file = os.path.join(directory, entry)
if(os.path.isfile(entry_file) and (entry[-4:] == ".wav")):
ps.decode(audio_file = entry_file, buffer_size = 2048, no_search = False, full_utt = False)
hypothesis = ps.hypothesis()
# format similar to outputs of pocketsphinx_batch
out_hyp_file.write(hypothesis + " (" + entry[:-4] + ")\n")
out_hyp_file.close()
Pocketsphinx_cont_wrapper.sh (output_continuous.hyp.txt, continuous.log.config.txt):
# !bin/bash
# make sure you're running from .venv where your pocketsphix is installed
model_dir=$(python3 -c "from pocketsphinx import get_model_path; print(get_model_path())")
curr_dir=$(pwd)
cd $1
out_file=output_continuous.hyp
if test -f "$out_file"; then
rm $out_file
fi
for f in *.wav
do
hyp=$(pocketsphinx_continuous -infile $f \
-hmm "${model_dir}/en-us" \
-lm "${model_dir}/en-us.lm.bin" \
-dict "${model_dir}/cmudict-en-us.dict" \
-samprate 16000 \
)
f_name=$(basename $f .wav)
# this shall give similar output format as pocketsphinx_batch, so we can simply diff it
echo "${hyp} (${f_name})" >> $out_file
done
cd $curr_dir
pocketsphinx_batch_wrapper.sh (output_batch.hyp.txt, batch.log.config.txt):
# !bin/bash
# make sure you're running from .venv where your pocketsphix is installed
model_dir=$(python3 -c "from pocketsphinx import get_model_path; print(get_model_path())")
curr_dir=$(pwd)
cd $1
ctl_filename="ctlfile.txt"
# there's no -q flag for rm, so do it this way?
if test -f "$ctl_filename"; then
rm $ctl_filename
fi
for f in *.wav
do
echo $(basename $f .wav) >> $ctl_filename
done
# The adcin seems to be important here
# https://cmusphinx.github.io/wiki/tutorialtuning/
pocketsphinx_batch -adcin yes \
-cepdir . \
-cepext .wav \
-ctl $ctl_filename \
-hmm "${model_dir}/en-us" \
-lm "${model_dir}/en-us.lm.bin" \
-dict "${model_dir}/cmudict-en-us.dict" \
-samprate 16000 \
-hyp output_batch.hyp
cd $curr_dir