segment-anything How is ViTDet backbone pretrained with MAE?

How is ViTDet backbone pretrained with MAE?

Open prakashjayy opened this issue 10 months ago • 0 comments

In the paper they have mentioned that image encoder pretrained using MAE is used. Just want to understand how network is pretained using MAE when window size is (14, 14). Do we pretrain on window size of (0, 0) and then fine tune on (14, 14).

Thanks

Apr 17 '24 07:04 prakashjayy

segment-anything segment-anything copied to clipboard

How is ViTDet backbone pretrained with MAE?

segment-anything
segment-anything copied to clipboard