snowfall How to combine the nenural net log-softmax outputs and fsa

Hello, I am reading train.py and decode.py. For me, It is difficult to know how to combine the nenural net log-softmax outputs and fsa. Could you provide some papers or description about that to help me understand, thanks. Here is the codes what I don't understand:

dense_fsa_vec = k2.DenseFsaVec(nnet_output, supervision_segments)

Dec 16 '20 03:12 Curisan

You can find some descriptions about it by visiting the following two links:

https://github.com/k2-fsa/k2/blob/2dbb3e09b152fcf98354c946baa271e5b57c8321/k2/csrc/fsa.h#L114

/*
  Vector of FSAs that actually will come from neural net log-softmax outputs (or
  similar).
  Conceptually this is a 3-dimensional tensor of log-probs with the second
  dimension ragged, i.e.  the shape would be [ num_fsas, None, num_symbols+1 ],
  e.g. if this were a TF ragged tensor.  The indexing would be
  [fsa_idx,t,symbol+1], where the "+1" after the symbol is so that we have
  somewhere to put the output for symbol == -1 (remember, -1 is kFinalSymbol,
  used on the last frame).
  Also, if a particular FSA has T frames of neural net output, we actually
  have T+1 potential indexes, 0 through T, so there is space for the terminating
  final-symbol on frame T.  (On the last frame, the final symbol has
  logprob=0, the others have logprob=-inf).
 */

https://github.com/k2-fsa/k2/blob/2dbb3e09b152fcf98354c946baa271e5b57c8321/k2/python/k2/dense_fsa_vec.py#L15

class DenseFsaVec(object):

    def __init__(self, log_probs: torch.Tensor,
                 supervision_segments: torch.Tensor) -> None:
        '''Construct a DenseFsaVec from neural net log-softmax outputs.
        Args:
          log_probs:
            A 3-D tensor of dtype ``torch.float32`` with shape ``(N, T, C)``,
            where ``N`` is the number of sequences, ``T`` the maximum input
            length, and ``C`` the number of output classes.
          supervision_segments:
            A 2-D **CPU** tensor of dtype ``torch.int32`` with 3 columns.
            Each row contains information for a supervision segment. Column 0
            is the ``sequence_index`` indicating which sequence this segment
            comes from; column 1 specifies the ``start_frame`` of this segment
            within the sequence; column 2 contains the ``duration`` of this
            segment.
            Note:
              - ``0 < start_frame + duration <= T``
              - ``0 <= start_frame < T``
              - ``duration > 0``
        '''

Dec 16 '20 04:12 csukuangfj

Mm, thanks , I have seen these two material. But it is too little for me. Could you provide other material?

Dec 16 '20 04:12 Curisan

I am writing tutorials for k2. Please just wait for a few days.

Dec 16 '20 04:12 csukuangfj

@Curisan just add some notes in case you are eager to learn about this before fangjun's documentation.

When we train or decode, usually we feed data into nnet model batch by batch, we prepare batch data with K2SpeechRecognitionIterableDataset in lhotse

https://github.com/lhotse-speech/lhotse/blob/08c31c3bd2711d4b6c614d64a1d3c26abb892a37/lhotse/dataset/speech_recognition.py#L86-L94

You can see that a batch is a few of Cuts and each Cut may have multiple supervisions, so the question we have now is: after feeding feature (N, T, C_feature) into nnet and getting nnet_output (N, T, C_nnet_output), we need to know which part of nnet_output corresponds to each supervision, right? This is exactly what k2.DenseFsaVec(nnet_output, supervision_segments) does. As supervision_segments gives the info of seq_idx (corresponds to N in nnet_output), start_frames and num_frames (corresponds to T in nnet_output), then we can easily get the part of nnet_output for each supervision with those information in DenseFsaVec (of course if we do subsampling in model like tdnn, we need to do the same subsampling for start_frame and num_frames as well.)

Then in DenseFsaVec, for each supervision (with the corresponding part of nnet_ouput), we'll create an DenseFsa (Hopefully you have understood the format of DenseFsa with the documentation in fsa.h, but you can also view it as a normal Fsa, they are equivalent from the perpective of Fsa concept). So next step we'll call intersect_(pruned) to intersect the DenseFsa with the decoding_grah to get the lattice, then get the tot_scores or best_path for training or decoding.

You may want to check test code in k2/python/tests or test code in lhotse to get know well about the data format. Feel free to ping us if there's any question.

Dec 16 '20 06:12 qindazhu

Thank you very much.

Dec 17 '20 01:12 Curisan

@Curisan There is some documentation about dense fsa vector available at https://k3.readthedocs.io/en/latest/core_concepts/index.html

Please let us know whether it is clear or need more clarification.

Dec 17 '20 07:12 csukuangfj

Great!

Dec 17 '20 08:12 Curisan

Could you provide some papers or description about that to help me understand

Here is a paper I just found that is relevant about it:

Generating exact lattices in the WFST framework, https://www.danielpovey.com/files/2012_icassp_lattices.pdf

Figure 1 from the paper shows what DenseFsaVec looks like. It is called "the search graph of the utterance" in the paper.

Jan 24 '21 14:01 csukuangfj

In that paper the DenseFsaVec would be " Acceptor U describing the acoustic scores of an utterance" In k2, so far we are dealing only with state-level lattices, not determinized lattices. The "search graph of the utterance" (S = U o HCLG) is the result of calling IntersectDensePruned().

On Sun, Jan 24, 2021 at 10:20 PM Fangjun Kuang [email protected] wrote:

Could you provide some papers or description about that to help me understand

Here is a paper I just found that is relevant about it:

Generating exact lattices in the WFST framework, https://www.danielpovey.com/files/2012_icassp_lattices.pdf

Figure 1 from the paper shows what DenseFsaVec looks like. It is called "the search graph of the utterance" in the paper.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/snowfall/issues/44#issuecomment-766355840, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO6C4TZGSIMM6HDWVODS3QUD3ANCNFSM4U5LJLYA .

Jan 24 '21 14:01 danpovey

I see. Thanks.

Jan 24 '21 22:01 csukuangfj

snowfall snowfall copied to clipboard

How to combine the nenural net log-softmax outputs and fsa

snowfall
snowfall copied to clipboard