unilm icon indicating copy to clipboard operation
unilm copied to clipboard

BEiTv2 MIM accuracy

Open ckddls1321 opened this issue 2 years ago • 3 comments

Describe Model I am using BEiTv2 ViT-L/16.

I pre-trained 1K using vqkd tokenizer, it seems MIM(Masked Image Model) accuracy does not go up to 40~50%.

Can you provide log or accuracy reference of pre-training on 1K with tokenizer? Also if you have evaluation results on 1K, can you share us?

ckddls1321 avatar Oct 05 '22 02:10 ckddls1321

Hello,

The MIM accuracy is about 16% when using vqkd tokenzier to pretrain ViT-L/16. image

When the pretraining schedule is 1600, the accuracy increases slightly: image

pengzhiliang avatar Oct 06 '22 11:10 pengzhiliang

Thank you! We also have the same trends during training.

During training, Models perform better on every scale to predict masked patches based on visual information. Even it would be hard to predict all scales correctly. The lastest checkpoint performs better.

ckddls1321 avatar Oct 21 '22 08:10 ckddls1321

What about the BEiTv2 ViT- B/16?

zengshao0622 avatar Nov 25 '22 03:11 zengshao0622