vit-pytorch

vit model from tensorflow

google vision transformer

converted from https://github.com/google-research/vision_transformer

referenced from https://github.com/lucidrains/vit-pytorch

model table

model type	input_size	pytorch_weights
ViT-B_16	224*224	B_16-224 提取码: 9mrd

useage

from vit_pytorch import VIT_B16_224

model = VIT_B16_224()
model.load_state_dict(torch.load('imagenet21k+imagenet2012_ViT-B_16-224.pth'))

input_size = 224
img = torch.randn(1, 3, input_size, input_size)

preds = model(img) # (1, 1000)

caution

this model using mean value of 127.5 and normlized with 127.5 the default normlize should replaced by

normalize_tf = transforms.Normalize(mean=[0.5, 0.5, 0.5],std=[0.5, 0.5, 0.5])

it will reduce 5% cls accurecy in imagenet 1K cls task

vit-pytorch
vit-pytorch copied to clipboard

Metadata

vit-pytorch

google vision transformer

model table

useage

caution

← Metadata

Owner

Metadata

vit-pytorch vit-pytorch copied to clipboard

Metadata

vit-pytorch

google vision transformer

model table

useage

caution

← Metadata

Owner

Metadata

vit-pytorch
vit-pytorch copied to clipboard