VW The performance

Hi, thanks for your great work. I trained the lawin + MiT-b2 with 80k iterations and the final performance is 46.64 mIoU. The training protocols are exactly the same as segformer. Here is the log file. 20220402_071143.log

Apr 11 '22 13:04 ydhongHIT

Hi ydhong. Have you tried training lawin-B2 for 160K iterations? The performance reported in table 1 is obtained by a 160k training course.

Apr 14 '22 10:04 yan-hao-tian

Hi ydhong. Have you tried training lawin-B2 for 160K iterations? The performance reported in table 1 is obtained by a 160k training course.

Thanks for your reply. According to my experience, training with 160k iterations won't improve a lot. Anyway, I will try it to see the performance. Besides, I trained the lawin + CSWin-T with 160k iterations and still obtained no obvious improvement. By the way, should embed_dim*3 here https://github.com/yan-hao-tian/lawin/blob/30d3cdb20d6faf03e3eac11c2c23de4fbb5639fe/lawin_head.py#L148 be changed to 512?

Apr 14 '22 10:04 ydhongHIT

Yes, it should be 512. Also, I recommend switching the proj_type in PatchEmbed from 'pool' to 'conv', which will take the group conv inlace of the mix pooling at a very little extra cost. https://github.com/yan-hao-tian/lawin/blob/92380f80a7e98b44207378dc6cfabf8dcb03f6eb/lawin_head.py#L185-L187 By the way, what is the competitor for lawin + CSwinT? UperNet or Semantic FPN?

Apr 14 '22 16:04 yan-hao-tian

Yes, it should be 512. Also, I recommend switching the proj_type in PatchEmbed from 'pool' to 'conv', which will take the group conv inlace of the mix pooling at a very little extra cost.

https://github.com/yan-hao-tian/lawin/blob/92380f80a7e98b44207378dc6cfabf8dcb03f6eb/lawin_head.py#L185-L187

By the way, what is the competitor for lawin + CSwinT? UperNet or Semantic FPN?

The baseline is CSWin-T + MLP decoder similar to segformer. I just want to reproduce the results of your paper in which you said you use the pooling. How much gain does using group conv instead of mix pooling bring?

Apr 15 '22 02:04 ydhongHIT

During the inference stage, I find that your model requires the input resolution to be multiple of 64. For the ADE20K, I use the 'ResizeToMultiple' in mmseg to achieve this. There may be some other details which can not be presented in the paper. So when are you going to release the code? Thanks again

Apr 15 '22 02:04 ydhongHIT

Sorry for the late reply. Honestly, we have to delay the full-code release plan because lawin has not been accepted by any confs or journals up to now, and we are currently writing a new version paper.

Apr 20 '22 08:04 yan-hao-tian

Sorry for the late reply. Honestly, we have to delay the full-code release plan because lawin has not been accepted by any confs or journals up to now, and we are currently writing a new version paper.

Sorry for the bad thing. I note that your reproduced results of swin is much higher than the original. For example, your uper-swin-B achieves 53.0 mIoU, 1.4 higher than the original 51.6. Could you send me your training config files of swin and the corresponding lawin-swin? My Email: [email protected]

Apr 20 '22 11:04 ydhongHIT

VW VW copied to clipboard

The performance

VW
VW copied to clipboard