icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Update DoubleSwish

Open yaozengwei opened this issue 2 years ago • 4 comments

This PR aims to update the DoubleSwish function. It just replaces x * sigmoid(x-1) with x * sigmoid(x-1) - 0.05x.

Experimental results on Dan's Zipformer training on train-clean-100 show that it can slightly outperform the baseline:

epoch-20-avg-5:

  • Baseline, 6.38 & 17.29,
  • Modified DoubleSwish, 6.33 & 17.15

epoch-25-avg-10:

  • Baseline, 6.04 & 16.41
  • Modified DoubleSwish, 6.02 & 16.28

epoch-30-avg-13:

  • Baseline, 6.0 & 15.85

  • Modified DoubleSwish, 5.97 & 15.78

  • [x] Make it compatible with existing recipes.

  • [x] Enable to save and load the alpha attribute using the _metadata in torch.nn.Module (https://github.com/pytorch/pytorch/blob/f62e54df8f04362db96154a94137d6afbdfaa953/torch/nn/modules/module.py#L404)

yaozengwei avatar Dec 02 '22 14:12 yaozengwei

Looks great to me.

csukuangfj avatar Dec 02 '22 14:12 csukuangfj

could you also update https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS-100hours.md

csukuangfj avatar Dec 02 '22 15:12 csukuangfj

could you also update https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS-100hours.md

OK. I will do that.

yaozengwei avatar Dec 02 '22 15:12 yaozengwei

I'm slightly having second thoughts about merging this, since the improvement was quite small. It's OK to merge if you're very confident that it's not going to cause us problems in future, e.g. due to mistakes in the back-compatibility code for example. Or runtime issues, if relevant. Make sure that anything inside "if torch.jit.is_scripting()" is correct.

danpovey avatar Dec 03 '22 07:12 danpovey