openfst-utils
openfst-utils copied to clipboard
Utilities for manipulating finite state transducers with the OpenFst library.
Utilities for OpenFst
Benoit Favre [email protected] released under Apache License Version 2.0
This is a set of useful programs for manipulating Finite State Transducer with the OpenFst library.
-
fstcompile-nolex [-t]: compile an acceptor [transducer], generate symbol lexicons on the fly and save them with the fst. Useful for quick hacks on a single fst.
-
fstcompose-maplex
: compose after mapping symbol tables (for use with fstcompile-nolex which generates different symbol tables) fst1 or fst2 can be "" (empty string) which means stdin. -
fstminimize-transducer: encode input/output, rmepsilon, determinize, minimize and decode in one pass.
-
fstdeterminize-tc-lex: keep the best output for each input sequence in a transducer using determinization in the (Tropical, Categorial)-Lexicographic semiring. See "Efficient Determinization of Tagged Word Lattices using Categorial and Lexicographic Semirings", by Izhak Shafran et al, ASRU 2011.
-
fstsuperfinal-noepsilon: add a superfinal state without adding epsilon arcs
-
fstoracle
: compute the shortest distance alignment between two transducers -
fstcompose-specials
: compose two transducers using special , and transitions. can replace any input symbol; is like sigma but only if no other path can be followed; is an epsilon transition which can be followed if no other transition matches an input symbol. Note that lexicons from the two fsts are mapped.
fstprint sentence.fst: 0 1 the 1 2 boy 2 3 is 3 4 here 4
fstprint model.fst:
0 0
./fstcompose-specials model.fst < sentence.fst | fstprint
0 1 the
-
fstposteriors: compute arc-level posterior probabilities from an automaton where weights are -log probs in the tropical semiring.
-
fstprint-nbest-strings
: compute nbest and print string of input/output symbols (or input if input == output)
--- less useful / non-working stuff ---
-
add-tags
: generate a tagging transducer from a word acceptor according to a tab-separated, one-word-per-line tag dictionary -
ngram-expand: expand transducer so that each arc has a fixed context of up to n-1 arcs (so called ngramize) -- non functional at this time