snowfall
snowfall copied to clipboard
Topology choices
It would be nice to have a choice about the topology to use, that could be passed in somehow, e.g. as a string when we do training. E.g. to have a wrapper
build_topo(tokens: List[int], topo_type: string = 'ctc', num_states: int = 1)
where you can specify, for instance, 'left_to_right' for the traditional left-to-right HMM topology without a blank,
with specifiable num_states (we expect this will normally be 1).
Caution: this tokens
should not contain 0, and I believe we should probably make build_ctc_topo add the 0 itself
internally, which IMO would be a better interface.
build_left_to_right_topo(tokens: List[int], num_states: int = 1) -> Fsa
This left-to-right topology will be useful for training alignment models, for instance.
@pzelasko something else that will be useful for word alignments, is if we can add an auxiliary label word_start
to the lexicon FST. This would be a label on the 1st arc of the 1st phone of each word, indicating the word-id. For many purposes, e.g. building a traditional decoding graph, we can remove this before using it; but this will be useful for getting word alignments. We'd have to write a function that would process a 1-best lattice path into word alignments, by first segmenting using the word-id and then stripping out any optional silence and/or blank (if relevant) from the end of the word. Of course this will be more accurate when using a xent or MMI model than ctc.
I also want to have example scripts for training a xent model where we subtract the prior from the nnet output; even if this is not better than regular CTC WER-wise, it will be useful for alignment purposes. We can initialize the phone prior to all-equal, and update it using a forgetting factor from the nnet output.