Fangjun Kuang issues

Results 152 issues of


Fangjun Kuang

Bug in decoder_padding_mask in BPE training

See the code below: https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L162 https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L167 https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L179 https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L709 https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L720-L721 ---- You can see that `ys_in_pad` is padded with eos_id, which is a positive word piece ID. However, it is using...

Bug in computing encoder padding mask

It happens only when `--concatenate-cuts=True`. See the problematic code below (line 692): https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L687-L692 When `--concatenate-cuts=True`, several utterances may be concatenated into one sequence. So `lengths[sequence_idx]` may correspond to multiple utterances....

[WIP] Implement 2nd pass training using 1-best decoding results from the 1st pass network

It implements https://github.com/k2-fsa/snowfall/pull/106#issuecomment-803796177 > BTW, since it seems this is hard to get to work, if you feel like it you could work on a simpler idea. In training time...

Support to choose unigram and bigram for P in LF-MMI training.

## With unigram LM for P ``` export CUDA_VISIBLE_DEVICES="0" ./mmi_att_transformer_train.py \ --master-port=12355 \ --full-libri=0 \ --use-ali-model=0 \ --max-duration=500 \ --use-unigram=1 ./mmi_att_transformer_decode.py \ --use-lm-rescoring=1 \ --num-paths=100 \ --max-duration=300 \ --use-unigram=1 ```...

Have a look at SpeechBrain

From https://github.com/k2-fsa/snowfall/pull/173#issuecomment-833624666 > BTW, someone should have a close look at SpeechBrain to see whether we might be able to use it with Lhotse and k2 as the base for...

WIP: Add BPE training with LF-MMI.

A small vocab_size, e.g., 200, is used to avoid OOM if the bigram P is used. After removing P, it is possible to use a large vocab size, e.g., 5000....

WIP: add compute-post.

Usage: ```bash $ snowfall net compute-post -m /ceph-fj/model-jit.pt -f exp/data/cuts_test-clean.json.gz -o exp ``` I find that there is one issue with the Torch Scripted module: We have to know the...

RuntimeError in ctc_att_transformer_train.py

See below (using the latest master) ``` 2021-03-29 07:34:23,835 INFO [common.py:270] ================================================================================ 2021-03-29 07:34:23,837 INFO [ctc_att_transformer_train.py:440] epoch 0, learning rate 0 Traceback (most recent call last): File "./ctc_att_transformer_train.py", line 508,...

WIP: Compute expected times per pathphone_idx.

Closes #96 @danpovey Do you have any idea how to test the code? And I am not sure how the return value is used. ~I am using `mbr_lats` instead of...

re-run mmi_mbr_train.py

Previously, `ans.phones` was always the same as `ans.labels` after `k2.compose` if `inner_labels=phones` was requested. https://github.com/k2-fsa/k2/pull/667 fixed this bug and we need to re-run the training script since the columns of...