vit-pytorch icon indicating copy to clipboard operation
vit-pytorch copied to clipboard

Training vit on Imagenet 1k got bad performance.

Open songlei00 opened this issue 4 years ago • 2 comments
trafficstars

I am using vit to train ImageNet 1k from scratch. The accuracy of SOTA is about 70% to 80%. But I can only reach 30%. I don't know why it doesn't work. I use the following configuration.

model = ViT(
    image_size=224, 
    patch_size=32, 
    num_classes=args['n_class'], 
    dim=768, 
    depth=args['depth'], 
    heads=12, 
    mlp_dim=3072, 
    dropout=0.1, 
    emb_dropout=0.1
)
optimizer = torch.optim.Adam(
    model.parameters(), 
    lr=1e-3,
    betas=(0.9, 0.999),
    weight_decay=0.0001
)
scheduler = CosineAnnealingLR(optimizer, T_max=1270, eta_min=1e-5)

The batch_size is 1024, and I adjust the learning rate after each batch.

songlei00 avatar Aug 31 '21 02:08 songlei00

@s974534426 its pretty hard to train a plain ViT from scratch, if you are not google or facebook

try https://github.com/lucidrains/vit-pytorch#nest , you should have an easier time with that

lucidrains avatar Aug 31 '21 18:08 lucidrains

I'm also trying to train ImageNet 1k from scratch using my own optimizer and hyper-parameters. I got better result of 55%@top1 which is still far from the SOTA result reported in the ViT paper. How about your current progress? @s974534426 Have you tried the hyper-parameters from the origin ViT paper ?

whu-dft avatar Apr 08 '22 09:04 whu-dft