snowfall icon indicating copy to clipboard operation
snowfall copied to clipboard

Topology choices

Open danpovey opened this issue 3 years ago • 0 comments

It would be nice to have a choice about the topology to use, that could be passed in somehow, e.g. as a string when we do training. E.g. to have a wrapper

build_topo(tokens: List[int], topo_type: string = 'ctc', num_states: int = 1)

where you can specify, for instance, 'left_to_right' for the traditional left-to-right HMM topology without a blank, with specifiable num_states (we expect this will normally be 1). Caution: this tokens should not contain 0, and I believe we should probably make build_ctc_topo add the 0 itself internally, which IMO would be a better interface.

build_left_to_right_topo(tokens: List[int], num_states: int = 1) -> Fsa

This left-to-right topology will be useful for training alignment models, for instance.

@pzelasko something else that will be useful for word alignments, is if we can add an auxiliary label word_start to the lexicon FST. This would be a label on the 1st arc of the 1st phone of each word, indicating the word-id. For many purposes, e.g. building a traditional decoding graph, we can remove this before using it; but this will be useful for getting word alignments. We'd have to write a function that would process a 1-best lattice path into word alignments, by first segmenting using the word-id and then stripping out any optional silence and/or blank (if relevant) from the end of the word. Of course this will be more accurate when using a xent or MMI model than ctc.

I also want to have example scripts for training a xent model where we subtract the prior from the nnet output; even if this is not better than regular CTC WER-wise, it will be useful for alignment purposes. We can initialize the phone prior to all-equal, and update it using a forgetting factor from the nnet output.

danpovey avatar May 13 '21 06:05 danpovey