refactor hard coded numbers for more control over parameters (MaskedAutoencoderConvViT)
Hi - I'd like to do patches of size 32x32, and a smaller model in general. any thing I change breaks the entire code. It would be really helpful if you refactored out all of the places that specify 4,2,16...etc throughout the code for MaskedAutoencoderConvViT
Thanks, Dan
Sorry for the troubling. Please refer to the following code for ViT-16.
img_size=[224, 56, 28] feat_size=[56, 28, 14] rel_scale1 = int(feat_size[0] / feat_size[2]) rel_scale2 = int(feat_size[1] / feat_size[2]) mask_for_patch1 = mask.reshape(-1, feat_size[-1], feat_size[-1]).unsqueeze(-1).repeat(1, 1, 1, rel_scale1 ** 2).reshape(-1, feat_size[-1], feat_size[-1], rel_scale1, rel_scale1).permute(0, 1, 3, 2, 4).reshape(x.shape[0], feat_size[0], feat_size[0]).unsqueeze(1)
You need to modify the stride for self.stage1_output_decode / self.stage2_output_decode