pt.fractalnet
pt.fractalnet copied to clipboard
PyTorch Implementation of FractalNet
FractalNet: Ultra-deep neural networks without residuals
PyTorch Implementation of FractalNet.
Larsson, Gustav, Michael Maire, and Gregory Shakhnarovich. "Fractalnet: Ultra-deep neural networks without residuals." arXiv preprint arXiv:1605.07648 (2016).
Requirements
pip install -r requirements.txt
- python 3
- pytorch >= 0.4
- torchvision
- numpy, tensorboard, tensorboardX
Usage
Default options are same as the paper, but I failed to reproduce the results of paper with the default options. So I've tried a lot to improve the performance and found a better option. best
indicates this option.
# train (default)
python train.py --data cifar10 --name cifar10
# train (best)
python train.py --data cifar10 --name cifar10-best --init torch --gap 1 --pad reflect
For the test, you must specify the same name and same options that you used for the training. Some options are only required for training such as epochs, dropout, droppath, init, and data augmentation.
# test (for deepest)
python test.py --data cifar10 --name cifar10
python test.py --data cifar10 --name cifar10-best --init torch --gap 1 --pad reflect
Run options
Train
$ python train.py --help
usage: Config [-h] --name NAME [--data DATA] [--batch_size BATCH_SIZE]
[--lr LR] [--momentum MOMENTUM] [--print_freq PRINT_FREQ]
[--gpu GPU] [--epochs EPOCHS] [--init_channels INIT_CHANNELS]
[--gdrop_ratio GDROP_RATIO] [--p_ldrop P_LDROP]
[--dropout_probs DROPOUT_PROBS] [--blocks BLOCKS]
[--columns COLUMNS] [--seed SEED] [--workers WORKERS]
[--aug_lv AUG_LV] [--off-drops] [--gap GAP] [--init INIT]
[--pad PAD] [--doubling] [--gdrop_type GDROP_TYPE]
[--dropout_pos DROPOUT_POS]
optional arguments:
-h, --help show this help message and exit
--name NAME
--data DATA CIFAR10 (default) / CIFAR100
--batch_size BATCH_SIZE
default: 100
--lr LR learning rate (default: 0.02)
--momentum MOMENTUM
--print_freq PRINT_FREQ
print frequency
--gpu GPU gpu device id
--epochs EPOCHS # of training epochs (default: 400)
--init_channels INIT_CHANNELS
doubling each block except the last (default: 64)
--gdrop_ratio GDROP_RATIO
ratio of global drop path (default: 0.5)
--p_ldrop P_LDROP local drop path probability (default: 0.15)
--dropout_probs DROPOUT_PROBS
dropout probs for each block with comma separated
(default: 0.0, 0.1, 0.2, 0.3, 0.4)
--blocks BLOCKS default: 5
--columns COLUMNS default: 3
--seed SEED random seed
--workers WORKERS # of workers
--aug_lv AUG_LV data augmentation level (0~2). 0: no augmentation, 1:
horizontal mirroring + [-4, 4] translation, 2: 1 +
cutout.
Experiment:
--off-drops turn off all dropout and droppath
--gap GAP 0: max-pool (default), 1: GAP - FC, 2: 1x1conv - GAP
--init INIT xavier (default) / he / torch (pytorch default)
--pad PAD zero (default) / reflect
--doubling turn on 1x1 conv channel doubling
--gdrop_type GDROP_TYPE
ps (per-sample, various gdrop per block) / ps-consist
(default; per-sample, consist global drop)
--dropout_pos DROPOUT_POS
CDBR (default; conv-dropout-BN-relu) / CBRD (conv-BN-
relu-dropout) / FD (fractal_block-dropout)
Test
$ python test.py --help
usage: Config [-h] --name NAME [--data DATA] [--batch_size BATCH_SIZE]
[--print_freq PRINT_FREQ] [--gpu GPU]
[--init_channels INIT_CHANNELS] [--blocks BLOCKS]
[--columns COLUMNS] [--workers WORKERS] [--gap GAP] [--pad PAD]
[--doubling] [--dropout_pos DROPOUT_POS]
optional arguments:
-h, --help show this help message and exit
--name NAME
--data DATA CIFAR10 (default) / CIFAR100
--batch_size BATCH_SIZE
default: 200
--gpu GPU gpu device id
--init_channels INIT_CHANNELS
doubling each block except the last (default: 64)
--blocks BLOCKS default: 5
--columns COLUMNS default: 3
--workers WORKERS # of workers
Experiment:
--gap GAP 0: max-pool (default), 1: GAP - FC, 2: 1x1conv - GAP
--pad PAD zero (default) / reflect
--doubling turn on 1x1 conv channel doubling
Results
Disclaimer
- The data augmentation method for C10++ is not described in the paper. Here I used the Cutout method in addition to horizontal mirroring and [-4, 4] translation.
- In C10++ and C100++, the epoch was increased to 600, if necessary.
Best plots
Best results for CIFAR10 and CIFAR100. 3.91% and 21.64%, respectively.
CIFAR10
Method | C10 | C10+ | C10++ |
---|---|---|---|
Default | 12.07% | 6.06% | 5.05% |
⠀⠀+ drop-path + dropout | 9.64% | 6.44% | 5.57% |
⠀⠀⠀⠀=> deepest | 10.14% | 7.19% | 5.85% |
Best | 10.87% | 6.16% | 5.19% |
⠀⠀+ drop-path + dropout | 8.47% | 6.04% | 5.49% |
⠀⠀⠀⠀=> deepest | 9.47% | 6.90% | 6.09% |
Paper | 10.18% | 5.22% | 5.11% |
⠀⠀+ drop-path + dropout | 7.33% | 4.60% | 4.59% |
⠀⠀⠀⠀=> deepest | 7.27% | 4.68% | 4.63% |
Best + FDO + local DP | 8.61% | 5.25% | 3.91% |
As mentioned before, the results of the paper were not reproduced. After several attempts, I've got the best
option, which is --init torch --gap 1 --pad reflect
. The best
option got about 1.2% better than the default.
Furthermore, I got better results with less regularization. The last row is that: "Best + FDO (Fractal dropout) + local DP (No global drop-path)". The run option is --init torch --gap 1 --pad reflect --global_drop_ratio 0. --dropout_pos FD
.
CIFAR100
Method | C100 | C100+ | C100++ |
---|---|---|---|
Default | |||
⠀⠀+ drop-path + dropout | 34.04% | 28.71% | 27.73% |
⠀⠀⠀⠀=> deepest | 36.69% | 31.95% | 30.66% |
Best | 36.99% | ||
⠀⠀+ drop-path + dropout | 31.84% | 29.18% | 29.04% |
⠀⠀⠀⠀=> deepest | 34.75% | 32.45% | 32.41% |
Paper | 35.34% | 23.30% | 22.85% |
⠀⠀+ drop-path + dropout | 28.20% | 23.73% | 23.36% |
⠀⠀⠀⠀=> deepest | 29.05% | 24.32% | 23.60% |
Best + FDO + local DP | 32.11% | 24.08% | 22.02% |
Best + FDO + local DP + doubling | 33.65% | 24.36% | 21.64% |
Likewise C100, the paper results were not reproduced. In C100, I did not perform as many experiments as C10. But the results is similar - "Best + FDO + local DP" is better.
However, there are some difference: default option is better than best
option in C100+ and C100++, and the doubling works better in C100 than in C10.
ETC
- The Keras implementation snf/keras-fractalnet also failed to reproduce the results of the paper.
- If you are familiar with Korean, there are more discussions and results in exp-note (kor).