vit-pytorch
vit-pytorch copied to clipboard
Training vit on Imagenet 1k got bad performance.
I am using vit to train ImageNet 1k from scratch. The accuracy of SOTA is about 70% to 80%. But I can only reach 30%. I don't know why it doesn't work. I use the following configuration.
model = ViT(
image_size=224,
patch_size=32,
num_classes=args['n_class'],
dim=768,
depth=args['depth'],
heads=12,
mlp_dim=3072,
dropout=0.1,
emb_dropout=0.1
)
optimizer = torch.optim.Adam(
model.parameters(),
lr=1e-3,
betas=(0.9, 0.999),
weight_decay=0.0001
)
scheduler = CosineAnnealingLR(optimizer, T_max=1270, eta_min=1e-5)
The batch_size is 1024, and I adjust the learning rate after each batch.
@s974534426 its pretty hard to train a plain ViT from scratch, if you are not google or facebook
try https://github.com/lucidrains/vit-pytorch#nest , you should have an easier time with that
I'm also trying to train ImageNet 1k from scratch using my own optimizer and hyper-parameters. I got better result of 55%@top1 which is still far from the SOTA result reported in the ViT paper. How about your current progress? @s974534426 Have you tried the hyper-parameters from the origin ViT paper ?