attention-transfer
attention-transfer copied to clipboard
Strategy of α and β decay during training
@szagoruyko @EderSantana Hi, your sharing code is appreciated, but would you please specify your strategy of decaying the two multipliers α and β during training process? Thanks in advance.