BiRefNet icon indicating copy to clipboard operation
BiRefNet copied to clipboard

Any information on compared with MVANet?

Open wang21jun opened this issue 9 months ago • 4 comments

What's your superiority compared with the work [Multi-view Aggregation Network for Dichotomous Image Segmentation]?

wang21jun avatar May 08 '24 08:05 wang21jun

MVANet is an interesting and good work, of which the results in their paper are even greater than ours on DIS5K. Compared with it, BiRefNet:

  1. has a simpler architecture: MVANet needs to crop a whole image into patches for parallel feature forwarding, which may not be easy to adapt when bs > 1.
  2. does more comprehensive experiments on many HR tasks: DIS, HRSOD, and COD, same architecture to achieve SOTA on these different tasks.
  3. has better community maintenance by enthusiastic contributors from the community and myself to publish more applications (human portrait segmentation, massive training for general object extraction, ...) and many 3rd-party applications, some of which are listed in README.
  4. has a better code framework (in my personal view), containing various plug-and-play modules, training acceleration, a better evaluation process, and backbone options...

ZhengPeng7 avatar May 08 '24 13:05 ZhengPeng7

BTW, have you run the code of MVANet?

ZhengPeng7 avatar May 08 '24 13:05 ZhengPeng7

Many thanks for your detailed reply. I am currently working on reimplementing the training process for MVANet, and I expect to finish the training process the day after tomorrow. If you're interested, I would be more than happy to discuss it with you and share my progress once it's completed. I will also delve deeper into and try out your work. Thanks again.

wang21jun avatar May 08 '24 15:05 wang21jun

You are welcome:) Looking forward to your results of MVANet. I also did the re-training of if but want to know the results reproduced by you.

ZhengPeng7 avatar May 08 '24 15:05 ZhengPeng7

Hi @wang21jun , got any results?

ZhengPeng7 avatar May 12 '24 02:05 ZhengPeng7

Trained by their given setting, that is : epoch: 80; lr_gen: 1e-5 batchsize: 1 trainsize: 1024 backbone(Swin-B) and pretrain model: swin_base_patch4_window12_384_22kto1k.pth training set: DIS5K-TR, evaluation set: DIS5K-VD I got the following results: Smeasure: 0.877 meanEm: 0.888 wFmeasure: 0.803 maximal Fmeasure: 0.872 MAE: 0.046

The obtained results demonstrate a certain level of dissatisfaction, requiring further examination. I am also attempting to retrain your BiRefNet model with the Swin-L backbone, which is anticipated to be completed by tomorrow.

wang21jun avatar May 13 '24 05:05 wang21jun

Thx, that's a long process. Looking forward to hearing the results of retrained BiRefNet from you, too!

ZhengPeng7 avatar May 13 '24 07:05 ZhengPeng7

Trained by 2 A100-80G with this script: './train_test.sh DIS 0,1 0'.(keep self.batch_size=4, Swin-L as backbone): Smeasure: 0.885 meanEm: 0.92 wFmeasure: 0.838 maximal Fmeasure: 0.877 MAE: 0.041 Although I was unable to reproduce the exact results of the paper, this is the best outcome I was able to achieve after trying these works: IS-Net, SegRefiner, MVANet, BiRefNet and so on. Taking into account the costs related to both training and inference, I will diligently explore several effective strategies to optimize the process. For instance, I will examine whether training the model for 600 epochs is truly necessary or if a reduced number of epochs could achieve satisfactory results. Additionally, I will consider the feasibility of switching the backbone to Swin-B, among other potential modifications, to further enhance the efficiency and performance of the process.

wang21jun avatar May 14 '24 05:05 wang21jun

Glad to see your results! Did you run python gen_best_ep.py to select the best ckpt? The default setting is for training on 8*A100-80G, especially the learning rate (half it for 2*A100-80G). In the following days, I'll also try some tricks, like half-precision training.

BTW, if you want to save some time in evaluation, you can turn off the calculation of some metrics as given in this line. Looking forward to hearing good results from you on the experiments you want to do.

ZhengPeng7 avatar May 14 '24 06:05 ZhengPeng7

Sure.

ZhengPeng7 avatar May 14 '24 07:05 ZhengPeng7

trained by 2*A100-80G,with lr=3e-5,valid on DIS-VD, new results: maxFm=0.902;wFmeasure=0.861;MAE=.035;Smeasure=0.906;meanEm=0.935;HCE=1057

wang21jun avatar May 28 '24 02:05 wang21jun

Wow, that's great! Even kind of better than my training on A100-80G x8. There seems to be still some space for improvement by adapting the hyper-parameters. Thanks!

ZhengPeng7 avatar May 28 '24 03:05 ZhengPeng7

Also provide a result trained by swin-b,with lr=3e-5,bs=6: maxFm=0.897;wFmeasure=0.857;MAE=.037;Smeasure=0.903;meanEm=0.944;HCE=1060

wang21jun avatar Jun 01 '24 15:06 wang21jun

Thanks for your updates! I've also spared time and GPUs to train BiRefNet with almost all quantity levels of backbones. The results and weights have been uploaded to the google drive folder. Your results are similar to mine.

ZhengPeng7 avatar Jun 01 '24 19:06 ZhengPeng7