SegFormer icon indicating copy to clipboard operation
SegFormer copied to clipboard

Pre-training info

Open nargenziano opened this issue 1 year ago • 5 comments

Hello and thanks for the work.

I was wondering if you could share more info regarding the pre-training of MiT architectures. I've read from other issues that the configs are the same as pvt_v2, but what is the actual pre-training code you used? Is it the PVT classification training? I tried to edit the PyramidVisionTransformer model to make it identical to MiT B3 and ran the ImageNet classification training of PVT from scratch, however, the classification performance was worse than the one expected for PVT-v2 B3 (around 77.3% Acc@1, instead of the expected 83.1%). What is the expected pre-training performance of MiT?

nargenziano avatar Jan 19 '23 17:01 nargenziano

I have the same question.

The paper states "We pre-train the encoder on the Imagenet-1K dataset".

Does this mean the encoder is trained a classification task first? If so, is there code for this to share? I can not find it in the repo.

Primarily, I want to be able to reproduce the "mit_*pth" files, either conceptually or with your code.

gauenk avatar Mar 03 '23 18:03 gauenk

following up...

gauenk avatar Apr 08 '23 04:04 gauenk

Same question here. Seems that the classification head(commented out) in MiT backbone won't work cause the output of stage 4 is B*49*512, and can't directly be followed with an nn.Linear to output B*1000.

wangh09 avatar Apr 18 '23 07:04 wangh09

following up..

Mike-HH avatar May 17 '23 03:05 Mike-HH

following up...

waw123456 avatar Jan 08 '24 08:01 waw123456