RationalNets
RationalNets copied to clipboard
Training with Rational Activations on very deep ResNets.
Hi, I am using your pytorch implementation to train a Rational ResNet 164 on CIFAR 10 and while I can get the model to behave well for a ResNet with 18-38 layers, I cannot get it to train for very deep resnets without dramatically lowering the learning rate. Here is 1 example with --lr 1e-6 --wd 1e-5 Train Epoch: 0 [0/47500 (0%)] Loss: 2.517 Train Epoch: 0 [1920/47500 (4%)] Loss: nan While I understand that the model with rational activations is supposed to represent a rational function with degree 3layers, the training process for deeper models isn't clear. Could you provide me some help ?