unilm
unilm copied to clipboard
BEiTv2 MIM accuracy
Describe Model I am using BEiTv2 ViT-L/16.
I pre-trained 1K using vqkd tokenizer, it seems MIM(Masked Image Model) accuracy does not go up to 40~50%.
Can you provide log or accuracy reference of pre-training on 1K with tokenizer? Also if you have evaluation results on 1K, can you share us?
Hello,
The MIM accuracy is about 16% when using vqkd tokenzier to pretrain ViT-L/16.
When the pretraining schedule is 1600, the accuracy increases slightly:
Thank you! We also have the same trends during training.
During training, Models perform better on every scale to predict masked patches based on visual information. Even it would be hard to predict all scales correctly. The lastest checkpoint performs better.
What about the BEiTv2 ViT- B/16?