mage
mage copied to clipboard
some questions about changing the image classification data set
If you change the data set, for example now the data set has 47 classes, what else to do but change nb_classes to 47 in main_finetune. Because only modify this final precision is not too high, I am not sure whether the 1000 here vocab_size = codebook_size + 1000 + 1 should be modified, and if modified, it will still report an error: RuntimeError: Error(s) in loading state_dict for VisionTransformerMage: size mismatch for token_emb.word_embeddings.weight: copying a param with shape torch.Size([2025, 768]) from checkpoint, the shape in current model is torch.Size([1072, 768]).
If you plan to finetune the ImageNet pre-trained MAGE on your dataset, you only need to change nb_classes to 47 in main_finetune. The performance can be poor for many reasons -- one reason could be your dataset is too far away from ImageNet image distribution. You could also consider adjusting the training epochs -- if your dataset is much smaller than ImageNet, you should increase the fine-tuning epochs.