SegFormer Pre-training info

Pre-training info

Open nargenziano opened this issue 2 years ago • 5 comments

Hello and thanks for the work.

I was wondering if you could share more info regarding the pre-training of MiT architectures. I've read from other issues that the configs are the same as pvt_v2, but what is the actual pre-training code you used? Is it the PVT classification training? I tried to edit the PyramidVisionTransformer model to make it identical to MiT B3 and ran the ImageNet classification training of PVT from scratch, however, the classification performance was worse than the one expected for PVT-v2 B3 (around 77.3% Acc@1, instead of the expected 83.1%). What is the expected pre-training performance of MiT?

Jan 19 '23 17:01 nargenziano

I have the same question.

The paper states "We pre-train the encoder on the Imagenet-1K dataset".

Does this mean the encoder is trained a classification task first? If so, is there code for this to share? I can not find it in the repo.

Primarily, I want to be able to reproduce the "mit_*pth" files, either conceptually or with your code.

Mar 03 '23 18:03 gauenk

following up...

Apr 08 '23 04:04 gauenk

Same question here. Seems that the classification head(commented out) in MiT backbone won't work cause the output of stage 4 is B*49*512, and can't directly be followed with an nn.Linear to output B*1000.

Apr 18 '23 07:04 wangh09

following up..

May 17 '23 03:05 Mike-HH

following up...

Jan 08 '24 08:01 waw123456

SegFormer SegFormer copied to clipboard

Pre-training info

SegFormer
SegFormer copied to clipboard