RationalNets Training with Rational Activations on very deep ResNets.

Training with Rational Activations on very deep ResNets.

Open 23Uday opened this issue 1 year ago • 1 comments

Hi, I am using your pytorch implementation to train a Rational ResNet 164 on CIFAR 10 and while I can get the model to behave well for a ResNet with 18-38 layers, I cannot get it to train for very deep resnets without dramatically lowering the learning rate. Here is 1 example with --lr 1e-6 --wd 1e-5 Train Epoch: 0 [0/47500 (0%)] Loss: 2.517 Train Epoch: 0 [1920/47500 (4%)] Loss: nan While I understand that the model with rational activations is supposed to represent a rational function with degree 3^layers, the training process for deeper models isn't clear. Could you provide me some help ?

May 28 '23 19:05 23Uday

RationalNets RationalNets copied to clipboard

Training with Rational Activations on very deep ResNets.

RationalNets
RationalNets copied to clipboard