trafficstars

MobileNetV3

An implementation of MobileNetV3 with pyTorch

Theory

You can find the paper of MobileNetV3 at Searching for MobileNetV3.

Prepare data

CIFAR-10
CIFAR-100
SVHN
Tiny-ImageNet
ImageNet: Please move validation images to labeled subfolders, you can use the script here.

Train

Train from scratch:

CUDA_VISIBLE_DEVICES=3 python train.py --batch-size=128 --mode=small \
--print-freq=100 --dataset=CIFAR100 --ema-decay=0 --label-smoothing=0.1 \
--lr=0.3 --save-epoch-freq=1000 --lr-decay=cos --lr-min=0 \
--warmup-epochs=5 --weight-decay=6e-5 --num-epochs=200 --width-multiplier=1 \
-nbd -zero-gamma -mixup

where the meaning of the parameters are as followed:

batch-size
mode: using MobileNetV3-Small(if set to small) or MobileNetV3-Large(if set to large).
dataset: which dataset to use(CIFAR10, CIFAR100, SVHN, TinyImageNet or ImageNet).
ema-decay: decay of EMA, if set to 0, do not use EMA.
label-smoothing: $epsilon$ using in label smoothing, if set to 0, do not use label smoothing.
lr-decay: learning rate decay schedule, step or cos.
lr-min: min lr in cos lr decay.
warmup-epochs: warmup epochs using in cos lr deacy.
num-epochs: total training epochs.
nbd: no bias decay.
zero-gamma: zero $gamma$ of last BN in each block.
mixup: using Mixup.

Pretrained models

We have provided the pretrained MobileNetV3-Small model in pretrained.

Experiments

Training setting

on ImageNet

CUDA_VISIBLE_DEVICES=5 python train.py --batch-size=128 --mode=small --print-freq=2000 --dataset=imagenet \
--ema-decay=0.99 --label-smoothing=0.1 --lr=0.1 --save-epoch-freq=50 --lr-decay=cos --lr-min=0 --warmup-epochs=5 \
--weight-decay=1e-5 --num-epochs=250 --num-workers=2 --width-multiplier=1 -dali -nbd -mixup -zero-gamma -save

on CIFAR-10

CUDA_VISIBLE_DEVICES=1 python train.py --batch-size=128 --mode=small --print-freq=100 --dataset=CIFAR10\
  --ema-decay=0 --label-smoothing=0 --lr=0.35 --save-epoch-freq=1000 --lr-decay=cos --lr-min=0\
  --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=400 --num-workers=2 --width-multiplier=1

on CIFAR-100

CUDA_VISIBLE_DEVICES=1 python train.py --batch-size=128 --mode=small --print-freq=100 --dataset=CIFAR100\
  --ema-decay=0 --label-smoothing=0 --lr=0.35 --save-epoch-freq=1000 --lr-decay=cos --lr-min=0\
  --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=400 --num-workers=2 --width-multiplier=1

Using more tricks：

CUDA_VISIBLE_DEVICES=1 python train.py --batch-size=128 --mode=small --print-freq=100 --dataset=CIFAR100\
  --ema-decay=0.999 --label-smoothing=0.1 --lr=0.35 --save-epoch-freq=1000 --lr-decay=cos --lr-min=0\
  --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=400 --num-workers=2 --width-multiplier=1\
  -zero-gamma -nbd -mixup

on SVHN

CUDA_VISIBLE_DEVICES=3 python train.py --batch-size=128 --mode=small --print-freq=1000 --dataset=SVHN\
  --ema-decay=0 --label-smoothing=0 --lr=0.35 --save-epoch-freq=1000 --lr-decay=cos --lr-min=0\
  --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=20 --num-workers=2 --width-multiplier=1

on Tiny-ImageNet

CUDA_VISIBLE_DEVICES=7 python train.py --batch-size=128 --mode=small --print-freq=100 --dataset=tinyimagenet\
  --data-dir=/media/data2/chenjiarong/ImageData/tiny-imagenet --ema-decay=0 --label-smoothing=0 --lr=0.15\
  --save-epoch-freq=1000 --lr-decay=cos --lr-min=0 --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=200\
  --num-workers=2 --width-multiplier=1 -dali

Using more tricks：

CUDA_VISIBLE_DEVICES=7 python train.py --batch-size=128 --mode=small --print-freq=100 --dataset=tinyimagenet\
  --data-dir=/media/data2/chenjiarong/ImageData/tiny-imagenet --ema-decay=0.999 --label-smoothing=0.1 --lr=0.15\
  --save-epoch-freq=1000 --lr-decay=cos --lr-min=0 --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=200\
  --num-workers=2 --width-multiplier=1 -dali -nbd -mixup

MobileNetV3-Large

on ImageNet

	Madds	Parameters	Top1-acc	Top5-acc
Offical 1.0	219 M	5.4 M	75.2%	-
Ours 1.0	216.6 M	5.47 M	-	-

on CIFAR-10

	Madds	Parameters	Top1-acc	Top5-acc
Ours 1.0	66.47 M	4.21 M	-	-

on CIFAR-100

	Madds	Parameters	Top1-acc	Top5-acc
Ours 1.0	66.58 M	4.32 M	-	-

MobileNetV3-Small

on ImageNet

	Madds	Parameters	Top1-acc	Top5-acc
Offical 1.0	56.5 M	2.53 M	67.4%	-
Ours 1.0	56.51 M	2.53 M	67.52%	87.58%

The pretrained model with top-1 accuracy 67.52% is provided in the folder pretrained.

on CIFAR-10 (Average accuracy of 5 runs)

	Madds	Parameters	Top1-acc	Top5-acc
Ours 1.0	17.51 M	1.52 M	92.97%	-

on CIFAR-100 (Average accuracy of 5 runs)

	Madds	Parameters	Top1-acc	Top5-acc
Ours 1.0	17.60 M	1.61 M	73.69%	92.31%
More Tricks	same	same	76.24%	92.58%

on SVHN (Average accuracy of 5 runs)

	Madds	Parameters	Top1-acc	Top5-acc
Ours 1.0	17.51 M	1.52 M	97.92%	-

on Tiny-ImageNet (Average accuracy of 5 runs)

	Madds	Parameters	Top1-acc	Top5-acc
Ours 1.0	51.63 M	1.71 M	59.32%	81.38%
More Tricks	same	same	62.62%	84.04%

Dependency

This project uses Python 3.7 and PyTorch 1.1.0. The FLOPs and Parameters and measured using torchsummaryX.

MobileNetV3
MobileNetV3 copied to clipboard

Metadata

MobileNetV3

Theory

Prepare data

Train

Pretrained models

Experiments

Training setting

on ImageNet

on CIFAR-10

on CIFAR-100

on SVHN

on Tiny-ImageNet

MobileNetV3-Large

on ImageNet

on CIFAR-10

on CIFAR-100

MobileNetV3-Small

on ImageNet

on CIFAR-10 (Average accuracy of 5 runs)

on CIFAR-100 (Average accuracy of 5 runs)

on SVHN (Average accuracy of 5 runs)

on Tiny-ImageNet (Average accuracy of 5 runs)

Dependency

← Metadata

Owner

Metadata

MobileNetV3 MobileNetV3 copied to clipboard

Metadata

MobileNetV3

Theory

Prepare data

Train

Pretrained models

Experiments

Training setting

on ImageNet

on CIFAR-10

on CIFAR-100

on SVHN

on Tiny-ImageNet

MobileNetV3-Large

on ImageNet

on CIFAR-10

on CIFAR-100

MobileNetV3-Small

on ImageNet

on CIFAR-10 (Average accuracy of 5 runs)

on CIFAR-100 (Average accuracy of 5 runs)

on SVHN (Average accuracy of 5 runs)

on Tiny-ImageNet (Average accuracy of 5 runs)

Dependency

← Metadata

Owner

Metadata

MobileNetV3
MobileNetV3 copied to clipboard