pocketsphinx-python icon indicating copy to clipboard operation
pocketsphinx-python copied to clipboard

Python vs. pocketsphinx_continuous/_batch - same config, different results

Open JindrichSindelar-eaton opened this issue 4 years ago • 0 comments

Hello all,

I'm playing with Pocketsphinx for few days and was curious about the differences in behavior of the Python library vs. the available executables (pocketsphinx_continuous, pocketsphinx_batch). I have enabled the Verbose flag for the Python version and adapted the 3 fields that were different from the logs I got from the mentioned executables (vad_threshold, kws_threshold, allphone_ci). My expectations were that the outputs of my python code below will match to one of the outputs generated by the bash scripts I call the executables from, but that doesn't happen.

Could you please give me some hints what else is different, what is the reason of these differences? The audio files used for all the programs are the same and all are mono, 16kHz 16-bit signed little-endian.

(Switching the ps.decode() arguments: no_search = True has no effect on the output, full_utt = True then doesn't produce any output at all. Where can I find what exactly do these two flags mean?)

Below I'm attaching the codes and files with the corresponding transcription outputs and configuration logs.

Python code (corresponding attachments: python_output_tuned.hyp.txt, python_tuned.log.txt):

import os
from os import path, listdir
from pocketsphinx import Pocketsphinx, get_model_path
import sox

model_path = get_model_path()
config = {
    # using the default values - see https://pypi.org/project/pocketsphinx/
    'hmm': os.path.join(model_path, 'en-us'),
    'lm': os.path.join(model_path, 'en-us.lm.bin'),
    'dict': os.path.join(model_path, 'cmudict-en-us.dict'),
    'sampling_rate': 16000,
    'verbose': True,
    # with following configs, the settings should exactly match what we can reach with the wrapped scripts
    'vad_threshold': 2.0,
    'kws_threshold': 1.0,
    'allphone_ci': False
}

ps = Pocketsphinx(**config)

# path to the directory where the .wav's are stored
directory = "../my_records/jindra/converted"

out_hyp_file_path = "./python_output_github.hyp"
out_hyp_file = open(out_hyp_file_path, "w")


file_list = os.listdir(directory)
# sort the list by alphabet (default order is "arbitrary") to obtain outputs diff-able with outputs of pocketsphinx_batch
file_list.sort()

for entry in file_list:
    entry_file = os.path.join(directory, entry)
    if(os.path.isfile(entry_file) and (entry[-4:] == ".wav")):
        ps.decode(audio_file = entry_file, buffer_size = 2048, no_search = False, full_utt = False)

        hypothesis = ps.hypothesis()
        # format similar to outputs of pocketsphinx_batch
        out_hyp_file.write(hypothesis + " (" + entry[:-4] + ")\n")

out_hyp_file.close()

Pocketsphinx_cont_wrapper.sh (output_continuous.hyp.txt, continuous.log.config.txt):

# !bin/bash

# make sure you're running from .venv where your pocketsphix is installed
model_dir=$(python3 -c "from pocketsphinx import get_model_path; print(get_model_path())")

curr_dir=$(pwd)
cd $1

out_file=output_continuous.hyp

if test -f "$out_file"; then
    rm $out_file
fi

for f in *.wav
do
    hyp=$(pocketsphinx_continuous   -infile $f \
                                    -hmm "${model_dir}/en-us" \
                                    -lm "${model_dir}/en-us.lm.bin" \
                                    -dict "${model_dir}/cmudict-en-us.dict" \
                                    -samprate 16000 \
                                    )
    f_name=$(basename $f .wav)
    # this shall give similar output format as pocketsphinx_batch, so we can simply diff it
    echo "${hyp} (${f_name})" >> $out_file
done

cd $curr_dir

pocketsphinx_batch_wrapper.sh (output_batch.hyp.txt, batch.log.config.txt):

# !bin/bash

# make sure you're running from .venv where your pocketsphix is installed
model_dir=$(python3 -c "from pocketsphinx import get_model_path; print(get_model_path())")

curr_dir=$(pwd)
cd $1

ctl_filename="ctlfile.txt"

# there's no -q flag for rm, so do it this way?
if test -f "$ctl_filename"; then
    rm $ctl_filename
fi

for f in *.wav
do
    echo $(basename $f .wav) >> $ctl_filename

done

# The adcin seems to be important here
# https://cmusphinx.github.io/wiki/tutorialtuning/
pocketsphinx_batch  -adcin yes \
                    -cepdir . \
                    -cepext .wav \
                    -ctl $ctl_filename \
                    -hmm "${model_dir}/en-us" \
                    -lm "${model_dir}/en-us.lm.bin" \
                    -dict "${model_dir}/cmudict-en-us.dict" \
                    -samprate 16000 \
                    -hyp output_batch.hyp

cd $curr_dir

JindrichSindelar-eaton avatar Jun 04 '20 15:06 JindrichSindelar-eaton