FAN icon indicating copy to clipboard operation
FAN copied to clipboard

Performance of FAN_tiny on ImageNet1K

Open bfshi opened this issue 1 year ago • 5 comments

Hi, congratulations on the cool work!

One question about the code: when I train fan_tiny_12_p16_224 on IN1K, I got the clean accuracy of 77.454, lower than the reported 79.2. I followed all the hyperparameters setting as in the README, except that I train the model on 4 gpus, each with batch-size of 200. Will that severely affect the performance? Or is there any other possible reason? Thanks!

bfshi avatar Oct 16 '22 19:10 bfshi

Hi,

Thanks for your interests in the work! Based on the previous experiments experience, the tiny model needs to be train on a large batch size (e.g. 1024) with 300 epochs.

In your case, it is probably that the network is not converged yet. You can do a sanity check on this by observing the training loss and the validation loss, seeing that they are still decreasing indicates my point.

If that's the case, you can simply increase the number of epochs to compensate the small batch size's impacts.

I hope this can help your experiments a little bit.

regards, DQ

zhoudaquan avatar Oct 18 '22 03:10 zhoudaquan

Hi,

Thanks for the response! I used 4 gpus and batch_size_per_gpu=200, that's total batch size of 800, which is not far from 1024 you used, so I think this shouldn't be the problem. I also double checked if the model has converged, and it seemed that the loss barely changed in the last 30 epochs so I guess it converged. I haven't found the reason yet, but I will try training a larger model to see if the problem still exists.

Thanks!

bfshi avatar Oct 21 '22 05:10 bfshi

Hi! I've tried training FAN-S and I can reproduce the results in the paper. However, when I train FAN-L, I found that the validation accuracy reaches a peak of ~83.5 at around epoch 200, and then falls back to ~82.3 after all 300 epochs. Is this supposed to happen? I trained on 150 batch_size_per_gpu * 8 gpus. All other configurations follow the ones in the repo. Thanks!

bfshi avatar Nov 02 '22 21:11 bfshi

Hi! I've tried training FAN-S and I can reproduce the results in the paper. However, when I train FAN-L, I found that the validation accuracy reaches a peak of ~83.5 at around epoch 200, and then falls back to ~82.3 after all 300 epochs. Is this supposed to happen? I trained on 150 batch_size_per_gpu * 8 gpus. All other configurations follow the ones in the repo. Thanks!

Hi, based on my previous experience, this typically indicates an overfitting. you can try to increase the value for the droppath.

Regards, DQ

zhoudaquan avatar Nov 03 '22 01:11 zhoudaquan

Thanks for the suggestion! I will try that.

bfshi avatar Nov 03 '22 02:11 bfshi