snowfall
snowfall copied to clipboard
[WIP] Add iterated loss
This PR implements iterated loss from https://github.com/k2-fsa/snowfall/issues/179#issuecomment-830565127. Reference: https://arxiv.org/pdf/1910.10324.pdf
The following results could be reproduced with:
python mmi_att_transformer_train.py --world-size 2 --full-libri 0 --use-ali-model 0 --max-duration 250 --iterated-layers 5 --iterated-scale 0.3
Results with different iterated scale are shown in Table 1, it doesn't show clear improvement now.
- Table 1
iterated scale | test-clean | test-other | test-clean (rescore) | test-other (rescore) |
---|---|---|---|---|
- | 6.74 | 17.18 | 5.63 | 14.92 |
- | 6.78 | 17.49 | 5.76 | 15.31 |
0.01 | 6.71 | 17.34 | 5.6 | 14.86 |
0.05 | 6.58 | 17.35 | 5.69 | 15.06 |
0.30 | 6.57 | 17.6 | 5.61 | 15.38 |
1.00 | 6.77 | 17.69 | 5.8 | 15.58 |
10.00 | 6.93 | 18.31 | 5.88 | 16.22 |
The first two lines are baseline results with no iterated loss. I run it twice to see the randomness of results.
Details:
- It adds an extra mmi loss after the 6th conformer layer. Also tried adding after both 4th and 8th layers, the results are similar, shown in Table 2.
- The weight of the bigram lm in mmi loss is not updated using the extra mmi loss. The comparison with the other way is shown in Table 3.
Extra results:
- Table 2 (Add extra mmi losses after 4th and 8th layers)
iterated scale | test-clean | test-other | test-clean (rescore) | test-other (rescore) |
---|---|---|---|---|
- | 6.74 | 17.18 | 5.63 | 14.92 |
- | 6.78 | 17.49 | 5.76 | 15.31 |
0.30 | 6.61 | 17.65 | 5.65 | 15.52 |
1.00 | 6.66 | 18.13 | 5.68 | 15.87 |
10.00 | 6.75 | 18.43 | 5.74 | 16.25 |
- Table 3 (Update bigram using extra mmi loss or not)
model | test-clean | test-other | test-clean (rescore) | test-other (rescore) |
---|---|---|---|---|
- | 6.57 | 17.6 | 5.61 | 15.38 |
+ update bigram with extra loss | 6.75 | 17.77 | 5.69 | 15.53 |