EKFAC-pytorch icon indicating copy to clipboard operation
EKFAC-pytorch copied to clipboard

how to deal with depthwise conv

Open yifan123 opened this issue 5 years ago • 5 comments

Thanks for the great work. I want to use EKFAC to train MobileNet. However, it has depthwise conv whose groups is not equal to 1. How to implement it?

yifan123 avatar Apr 03 '20 16:04 yifan123

For group convolutions, the difference is that the basis of the input channels (kfe_x) is block diagonal, where each block corresponds to each group. So that can be implemented in 2 different ways:

  1. Change the computation of xxt such that it is done on groups of input channels, rather than on all input channels. Then, perform the other operations (inversions, mm) using batched operations.

  2. Or, one can change the way matrix-matrix products are performed when using kfe_x, to make sure they respect the group structure of the convolution.

Thrandis avatar Apr 05 '20 15:04 Thrandis

Cool! I have another questions:

  1. I train ResNet18 and ResNet50 on ImageNet and cifar10 using EKFAC. However, the test accuracy is just as good as the sgd mothod with momentum and weight decay. I just use the EKFAC as preconditioners and do not change any of the hyperparameters. It seems that EKFAC has no effect claimed in the paper but increases the training time. Changing the update frequency doesn't help anything. The following is my setting: preconditioner = EKFAC(self.model, 0.1, ra=True,sua=False,alpha=0.75, update_freq=100)

  2. Group convolutions are now a very common structure in convolutional neural networks such as efficientNet, nasNet, mobileNet. It is necessary to support group convolutions. I'm sorry it is hard for me to support group convolutions in EKFAC according to your description. Will group convolution be supported in the future?

Thank you so much for the nice reply!!!

yifan123 avatar Apr 05 '20 16:04 yifan123

I train ResNet18 and ResNet50 on ImageNet and cifar10 using EKFAC. However, the test accuracy is just as good as the sgd mothod with momentum and weight decay. I just use the EKFAC as preconditioners and do not change any of the hyperparameters. It seems that EKFAC has no effect claimed in the paper but increases the training time. Changing the update frequency doesn't help anything. The following is my setting: preconditioner = EKFAC(self.model, 0.1, ra=True,sua=False,alpha=0.75, update_freq=100)

The eps hyper-parameter probably needs some tuning. Also, are you using Batch Normalization? If so, it is not exactly clear how EKFAC should be applied when using Batch Normalization.

Group convolutions are now a very common structure in convolutional neural networks such as efficientNet, nasNet, mobileNet. It is necessary to support group convolutions. I'm sorry it is hard for me to support group convolutions in EKFAC according to your description. Will group convolution be supported in the future?

I would be happy to review and merge any PR you propose. However, I don't have time to work on that myself.

Thrandis avatar Apr 05 '20 16:04 Thrandis

I'm using Batch Normalization. Changing eps will hurt the performance according to my experiment. Thanks for your reply😀, I will try it.

yifan123 avatar Apr 05 '20 16:04 yifan123

eps can be seen as balancing between K-FAC and SGD, so if you use a high eps, you bias your optimizer towards plain SGD (so that explains why you have similar results as SGD, but with slower performances).

Thrandis avatar Apr 05 '20 20:04 Thrandis