mmsegmentation Cannot reproduce the reported mIoU of Segformer on Cityscapes.

I cannot reproduce the 82.25 mIoU of Segformer on Cityscapes at my machine. I use 8 3090, with --seed 0 and --deterministic.

The log is attached.

Any ideas? Thanks in advance.

Aug 14 '22 00:08 fingertap

the learning rate seems not right, it stopped decreasing at 1e-5,and became 0.0 afterwards

Aug 14 '22 10:08 AlexWang1900

Why does this happen? I did not modify the scheduler settings.

the learning rate seems not right, it stopped decreasing at 1e-5,and became 0.0 afterwards

Aug 14 '22 10:08 fingertap

I'm afraid this is not the reason. The learning rate becomes zero in the benchmark too.

Aug 14 '22 10:08 fingertap

Hi @fingertap, Thanks for your report, we're working on reproducing the result and will feedback to you soon. However, we all know that the training results cannot be the same every time, and it is normal for the results to fluctuate within a small range.

Aug 15 '22 03:08 xiexinch

Hi @fingertap, Thanks for your report, we're working on reproducing the result and will feedback to you soon. However, we all know that the training results cannot be the same every time, and it is normal for the results to fluctuate within a small range.

I really appreciate your work on building a large-scale benchmark like this! I borrow many ideas from mmcv when implementing and arranging my codes.

However, the error is quite large to me. 82.25 is way higher than 81.9 on Cityscapes, as the models are approaching the performance ceiling of this dataset. The results should be the same if the seeds and deterministic flag is set, otherwise these options are not useful I'm afraid. This is critical when people are trying to use your benchmark in their research. If using the same config cannot reproduce the same results, how can they even improve over the baseline?

Aug 15 '22 03:08 fingertap

I really appreciate your work on building a large-scale benchmark like this! I borrow many ideas from mmcv when implementing and arranging my codes.

However, the error is quite large to me. 82.25 is way higher than 81.9 on Cityscapes, as the models are approaching the performance ceiling of this dataset. The results should be the same if the seeds and deterministic flag is set, otherwise these options are not useful I'm afraid. This is critical when people are trying to use your benchmark in their research. If using the same config cannot reproduce the same results, how can they even improve over the baseline?

We are re-running the model and will feedback to you asap.

Aug 15 '22 04:08 xiexinch

Hi @fingertap, We reran the model with seed=0 and a different deterministic flag. The corresponding logs are as follows: 20220814_211546.txt 20220815_114307.txt

Aug 17 '22 03:08 xiexinch

Hi @xiexinch , what a huge gap between my runs and yours! Actually, without --deterministic flag, I got an even worse score. I will attach the log later. Any ideas on this? If the device difference has such huge impact, the comparison between methods may be unfair. Actually, an improved version of OCRNet with 40k iters can achieve 81.93 at my machine. Segformer should outperform OCRNet i.m.o.

Aug 17 '22 03:08 fingertap