ConvMAE icon indicating copy to clipboard operation
ConvMAE copied to clipboard

refactor hard coded numbers for more control over parameters (MaskedAutoencoderConvViT)

Open DanTaranis opened this issue 3 years ago • 1 comments

Hi - I'd like to do patches of size 32x32, and a smaller model in general. any thing I change breaks the entire code. It would be really helpful if you refactored out all of the places that specify 4,2,16...etc throughout the code for MaskedAutoencoderConvViT

Thanks, Dan

DanTaranis avatar Dec 12 '22 10:12 DanTaranis

Sorry for the troubling. Please refer to the following code for ViT-16.

img_size=[224, 56, 28] feat_size=[56, 28, 14] rel_scale1 = int(feat_size[0] / feat_size[2]) rel_scale2 = int(feat_size[1] / feat_size[2]) mask_for_patch1 = mask.reshape(-1, feat_size[-1], feat_size[-1]).unsqueeze(-1).repeat(1, 1, 1, rel_scale1 ** 2).reshape(-1, feat_size[-1], feat_size[-1], rel_scale1, rel_scale1).permute(0, 1, 3, 2, 4).reshape(x.shape[0], feat_size[0], feat_size[0]).unsqueeze(1)

You need to modify the stride for self.stage1_output_decode / self.stage2_output_decode

gaopengpjlab avatar Dec 13 '22 06:12 gaopengpjlab