pocketsphinx New force-alignment API and two-pass alignment to get phone/state durations

Now you can (relatively) easily do a second pass of alignment to get phone durations after decoding or word alignment.

Note that this ignores the previously existing word boundaries for the moment, which probably isn't ideal. We should be able to constrain the state alignment to respect them without much trouble. In theory this should mostly just speed up alignment (it's a bit slow) and reduce memory consumption (it's really big).

Also, yeah, word alignment now uses FSG search, like SoundSwallower, so it's really fast and also handles silence and alternate pronunciations for you.

Sep 21 '22 18:09 dhdaines

Excited to check this out! I'm at Interspeech and out of phase by half day and all, but I'll get a look shortly

Sep 21 '22 22:09 lenzo-ka

No problem! The CLI for state alignment isn't quite there yet, but coming soon (tonight, I hope).

Sep 21 '22 22:09 dhdaines

Fantastic! I also hope to try this out ASAP. I wonder whether constraining to the first pass's word boundaries will help. It seems like it can't hurt, but it would be interesting to measure how much.

On Wed, Sep 21, 2022 at 3:42 PM David Huggins-Daines < @.***> wrote:

No problem! The CLI for state alignment isn't quite there yet, but coming soon (tonight, I hope).

— Reply to this email directly, view it on GitHub https://github.com/cmusphinx/pocketsphinx/pull/300#issuecomment-1254308132, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ4RVFMZXPP37UTRA5BSBTV7OFOXANCNFSM6AAAAAAQSKE6YM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Sep 21 '22 22:09 jsalsman

Fantastic! I also hope to try this out ASAP. I wonder whether constraining to the first pass's word boundaries will help. It seems like it can't hurt, but it would be interesting to measure how much.

It will definitely make the alignment faster. It may make it more accurate though I am not certain of this - I have to look at how I implemented this back in 2006: https://www.cs.cmu.edu/~dhuggins/Publications/phlab.pdf

EDIT: that paper was about forward-backward and not alignment, so not the same thing at all - in that case I implemented something like semi-Viterbi training, setting "impossible" phone sequences to zero probability, which resulted in models that were better for alignment (but somewhat worse for recognition)

Sep 21 '22 23:09 dhdaines

Hoping for state level alignments, and frame level scores also, but LGTM and WFM

State level alignments are already there in the Python API, look at cython/test/alignment_test.py for an example, but it is now easy to add them to the command-line front-end as well, so I'll do that (not on by default though)

Sep 26 '22 22:09 dhdaines

pocketsphinx pocketsphinx copied to clipboard

New force-alignment API and two-pass alignment to get phone/state durations

pocketsphinx
pocketsphinx copied to clipboard