icefall
icefall copied to clipboard
Update DoubleSwish
This PR aims to update the DoubleSwish function. It just replaces x * sigmoid(x-1) with x * sigmoid(x-1) - 0.05x.
Experimental results on Dan's Zipformer training on train-clean-100 show that it can slightly outperform the baseline:
epoch-20-avg-5:
- Baseline, 6.38 & 17.29,
- Modified DoubleSwish, 6.33 & 17.15
epoch-25-avg-10:
- Baseline, 6.04 & 16.41
- Modified DoubleSwish, 6.02 & 16.28
epoch-30-avg-13:
-
Baseline, 6.0 & 15.85
-
Modified DoubleSwish, 5.97 & 15.78
-
[x] Make it compatible with existing recipes.
-
[x] Enable to save and load the
alphaattribute using the_metadataintorch.nn.Module(https://github.com/pytorch/pytorch/blob/f62e54df8f04362db96154a94137d6afbdfaa953/torch/nn/modules/module.py#L404)
Looks great to me.
could you also update https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS-100hours.md
could you also update https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS-100hours.md
OK. I will do that.
I'm slightly having second thoughts about merging this, since the improvement was quite small. It's OK to merge if you're very confident that it's not going to cause us problems in future, e.g. due to mistakes in the back-compatibility code for example. Or runtime issues, if relevant. Make sure that anything inside "if torch.jit.is_scripting()" is correct.