César Laurent
César Laurent
Unfortunately, I have no idea. If the model was dumped with the old serialization, then it is likely that it won't be loaded properly with the new serialization mechanism, so...
For group convolutions, the difference is that the basis of the input channels (`kfe_x`) is block diagonal, where each block corresponds to each group. So that can be implemented in...
> I train ResNet18 and ResNet50 on ImageNet and cifar10 using EKFAC. However, the test accuracy is just as good as the sgd mothod with momentum and weight decay. I...
`eps` can be seen as balancing between K-FAC and SGD, so if you use a high `eps`, you bias your optimizer towards plain SGD (so that explains why you have...