SwitchNorm_Segmentation
                                
                                 SwitchNorm_Segmentation copied to clipboard
                                
                                    SwitchNorm_Segmentation copied to clipboard
                            
                            
                            
                        Switchable Normalization for semantic image segmentation and scene parsing.
Switchable Normalization for Semantic Segmentation
This repository contains the code of using Swithable Normalization (SN) in semantic image segmentation, proposed by the paper "Differentiable Learning-to-Normalize via Switchable Normalization".
This is the implementations of the experiments presented in the above paper by using open-source semantic segmentation framework Scene Parsing on MIT ADE20K.
Update
- 2018/9/26: The code and trained models of semantic segmentation on ADE20K by using SN are released !
- More results and models will be released soon.
Citation
You are encouraged to cite the following paper if you use SN in research or wish to refer to the baseline results.
@article{SwitchableNorm,
  title={Differentiable Learning-to-Normalize via Switchable Normalization},
  author={Ping Luo and Jiamin Ren and Zhanglin Peng},
  journal={arXiv:1806.10779},
  year={2018}
}
Getting Started
Use git to clone this repository:
git clone https://github.com/switchablenorms/SwitchNorm_Segmentation.git
Environment
The code is tested under the following configurations.
- Hardware: 1-8 GPUs (with at least 12G GPU memories)
- Software: CUDA 9.0, Python 3.6, PyTorch 0.4.0, tensorboardX
Installation & Data Preparation
Please check the Environment, Training and Evaluation subsection in the repo Scene Parsing on MIT ADE20K for a quick start.
Pre-trained Models
Download SN based ImageNet pretrained model and put them into the {repo_root}/pretrained_sn.
ImageNet pre-trained models
The backbone models with SN pretrained on ImageNet are available in the format used by above Segmentation Framework and this repo.
- ResNet50v1+SN(8,2) [pretrained_SN(8,2)]
For more pretrained models with SN, please refer to the repo of switchablenorms/Switchable-Normalization.
The following script converts the model trained from Switchable-Normalization into a valid format used by the semantic segmentation codebase :  ./pretrained_sn/convert_sn.py
usage: python -u convert_sn.py
NOTE: The paramater keys in pretrained model checkpoint must match the keys in backbone model EXACTLY. You should load the correct pretrained model according to your segmentation architechure.
Training
- The training strategies of baseline models and sn-based models on ADE20K are same as Scene Parsing on MIT ADE20K.
- The training script with ResNet-50-sn backbone can be found here:  ./scripts/train.sh
NOTE: The default architecture of this repo is Encoder: resnet50_dilated8  ( resnetXX_dilatedYY: customized resnetXX with dilated convolutions, output feature map is 1/YY of input size, see DeepLab for more details ) and Decoder: c1_bilinear_deepsup ( 1 conv + bilinear upsample + deep supervision, see PSPNet for more details ).
Optional arguments (see full input arguments via ./train.py):
  --arch_encoder         architecture of encode network
  --arch_decoder         architecture of decode network
  --weights_encoder      weights to finetune endoce network
  --weights_decoder      weights to finetune decode network
  --list_train           the list to load the training data 
  --root_dataset         the path of the dataset
  --batch_size_per_gpu   input batch size
  --start_epoch          epoch to start training. (continue from a checkpoint loaded via weights_encoder & weights_decoder)
  
NOTE:  In this repo, --start_epoch allows the training to resume from the checkpoint loaded from --weights_encoder and --weights_decoder, which is generated in the training process automatically. If you want to train from scratch, you need to assign --start_epoch as 1 and set --weights_encoder and --weights_decoder   to the blank value.
Evaluation
- The evaluation script with ResNet-50-sn backbone can be found here : ./scripts/evaluate.sh
Optional arguments (see full input arguments via ./eval.py):
  --arch_encoder         architecture of encode network
  --arch_decoder         architecture of decode network
  --suffix               which snapshot to load
  --list_val             the list to load the validation data 
  --root_dataset         the path of the dataset
  --imgSize              list of input image sizes
--imgSize enables single-scale or multi-scale inference. When --load_dir is with the int type, the single-scale inference will be started up. When --load_dir is a int list,  the multi-scale test will be applied.
Main Results
Semantic Segmentation Results on ADE20K
The experiment results are on the ADE20K validation set. MS test is short for multi-scale test. sync BN indicates the mutli-GPU synchronization batch normalization. More results and models will be released soon.
| Architecture | Norm | MS test | Mean IoU | Pixel Acc. | Overall Score | Download | 
|---|---|---|---|---|---|---|
| ResNet50_dilated8 + c1_bilinear_deepsup | sync BN | no | 36.43 | 77.30 | 56.87 | encoder decoder | 
| ResNet50_dilated8 + c1_bilinear_deepsup | GN | no | 35.66 | 77.24 | 56.45 | encoder decoder | 
| ResNet50_dilated8 + c1_bilinear_deepsup | SN-(8,2) | no | 38.72 | 78.90 | 58.82 | encoder decoder | 
| ResNet50_dilated8 + c1_bilinear_deepsup | sync BN | yes | 37.69 | 78.29 | 57.99 | -- | 
| ResNet50_dilated8 + c1_bilinear_deepsup | GN | yes | 36.32 | 77.77 | 57.05 | -- | 
| ResNet50_dilated8 + c1_bilinear_deepsup | SN-(8,2) | yes | 39.21 | 79.20 | 59.21 | -- | 
NOTE: For all settings in this repo, we employ ResNet as the backbone network, using the original 7×7 kernel size in the first convolution layer. This is different from the MIT framework , which adopts 3 convolution layers with the kernel size 3×3 at the bottom of the network. See  ./models/resnet_v1_sn.py for the details.