segment-anything icon indicating copy to clipboard operation
segment-anything copied to clipboard

How is ViTDet backbone pretrained with MAE?

Open prakashjayy opened this issue 10 months ago • 0 comments

In the paper they have mentioned that image encoder pretrained using MAE is used. Just want to understand how network is pretained using MAE when window size is (14, 14). Do we pretrain on window size of (0, 0) and then fine tune on (14, 14).

Thanks

prakashjayy avatar Apr 17 '24 07:04 prakashjayy