Tianhong Li
Tianhong Li
Our implementation in Cifar is not based on the moco and imagenet framework, so it could be hard to directly adapt the released code to Cifar. Our implementation in Cifar...
This command generates images from scratch, i.e., 100% masking ratio.
Regarding codes reconstructing masked images: I'm currently working on another deadline so I haven't released that part of the code. But it should be fairly easy to implement by providing...
@zhihao-2022 if you want to use 32x32 images as input, you need to change the pre-trained VQGAN, as the provided one is pre-trained on ImageNet 256x256 with patch size 16....
Yes, Inception Score can typically vary a bit. A 1-2 standard deviation is expected.
Yes you need to re-train the VQGAN -- the provided VQGAN checkpoint is pre-trained on ImageNet-1K, which does not include galaxy images. Unfortunately, we cannot release the code for VQGAN...
If you plan to finetune the ImageNet pre-trained MAGE on your dataset, you only need to change nb_classes to 47 in main_finetune. The performance can be poor for many reasons...
What is the model you are using? It seems the initialized model does not contain fc_norm and head. The correct model to use for fine-tuned models should be vit_large_patch16 from...
@rememberBr A fine-tuned model is fine-tuned for classification and cannot be used for generation.
Thanks for your interest! For Figure 1 in the paper, as mentioned in the caption, "the mask for MAGE is on semantic tokens whereas that of MAE is on patches...