VW
VW copied to clipboard
iclr2024 poster Varying Window Attention
Lawin Transformer: Improving New-Era Vision Backbones with Multi-Scale Representations for Semantic Segmentation
[Paper
] [Poster
] [Supplementary
] [Code
]
🔥🔥🔥 A 4-page abstract version accepted by Transformers for Vision @CVPR2023.
🔥🔥🔥 A formal version Multi-Scale Representations by Varying Window Attention for Semantic Segmentation accepted by ICLR2024
Installation
See installation instructions.
Datasets
See Preparing Datasets for MaskFormer.
Train
More Utilization: See Getting Started with MaskFormer.
MaskFormer
Swin-Tiny
python ./train_net.py \
--resume --num-gpus 2 --dist-url auto \
--config-file configs/ade20k-150/swin/lawin/lawin_maskformer_swin_tiny_bs16_160k.yaml \
OUTPUT_DIR path/to/tiny TEST.EVAL_PERIOD 10000 MODEL.MASK_FORMER.SIZE_DIVISIBILITY 64
Swin-Small
python ./train_net.py \
--resume --num-gpus 4 --dist-url auto \
--config-file configs/ade20k-150/swin/lawin/lawin_maskformer_swin_small_bs16_160k.yaml \
OUTPUT_DIR path/to/small TEST.EVAL_PERIOD 10000 MODEL.MASK_FORMER.SIZE_DIVISIBILITY 64
Swin-Base
python ./train_net.py \
--resume --num-gpus 8 --dist-url auto \
--config-file configs/ade20k-150/swin/lawin/lawin_maskformer_swin_base_IN21k_384_bs16_160k_res640.yaml \
OUTPUT_DIR path/to/base TEST.EVAL_PERIOD 10000 MODEL.MASK_FORMER.SIZE_DIVISIBILITY 64
Swin-Large
python ./train_net.py \
--resume --num-gpus 16 --dist-url auto \
--config-file configs/ade20k-150/swin/lawin/lawin_maskformer_swin_large_IN21k_384_bs16_160k_res640.yaml \
OUTPUT_DIR path/to/large TEST.EVAL_PERIOD 10000 MODEL.MASK_FORMER.SIZE_DIVISIBILITY 64
Mask2Former
Swin-Tiny
cd Mask2Former
python ./train_net.py \
--resume --num-gpus 2 --dist-url auto \
--config-file configs/ade20k/semantic-segmentation/swin/lawin/lawin_maskformer2_swin_tiny_bs16_160k.yaml \
OUTPUT_DIR path/to/tiny TEST.EVAL_PERIOD 10000 MODEL.MASK_FORMER.SIZE_DIVISIBILITY 64
Swin-Small
cd Mask2Former
python ./train_net.py \
--resume --num-gpus 4 --dist-url auto \
--config-file configs/ade20k/semantic-segmentation/swin/lawin/lawin_maskformer2_swin_small_bs16_160k.yaml \
OUTPUT_DIR path/to/small TEST.EVAL_PERIOD 10000 MODEL.MASK_FORMER.SIZE_DIVISIBILITY 64
Swin-Base
cd Mask2Former
python ./train_net.py \
--resume --num-gpus 8 --dist-url auto \
--config-file configs/ade20k/semantic-segmentation/swin/lawin/lawin_maskformer2_swin_base_IN21k_384_bs16_160k_res640.yaml \
OUTPUT_DIR path/to/base TEST.EVAL_PERIOD 10000 MODEL.MASK_FORMER.SIZE_DIVISIBILITY 64
Swin-Large
cd Mask2Former
python ./train_net.py \
--resume --num-gpus 16 --dist-url auto \
--config-file configs/ade20k/semantic-segmentation/swin/lawin/lawin_maskformer2_swin_large_IN21k_384_bs16_160k_res640.yaml \
OUTPUT_DIR path/to/large TEST.EVAL_PERIOD 10000 MODEL.MASK_FORMER.SIZE_DIVISIBILITY 64
Evaluation
MaskFormer
python ./train_net.py \
--eval-only --num-gpus NGPUS --dist-url auto \
--config-file path/to/config \
MODEL.WEIGHTS path/to/weight TEST.AUG.ENABLED True MODEL.MASK_FORMER.SIZE_DIVISIBILITY 64
Mask2Former
cd Mask2Former
python ./train_net.py \
--eval-only --num-gpus NGPUS --dist-url auto \
--config-file path/to/config \
MODEL.WEIGHTS path/to/weight TEST.AUG.ENABLED True MODEL.MASK_FORMER.SIZE_DIVISIBILITY 64
Model
Name | Backbone | crop size |
lr sched |
mIoU | mIoU (ms+flip) |
download |
---|---|---|---|---|---|---|
Lawin-MaskFormer | Swin-T | 512x512 | 160k | 47.4 | 49.0 | model |
Lawin-MaskFormer | Swin-S | 512x512 | 160k | 50.5 | 52.7 | model |
Lawin-MaskFormer | Swin-B | 640x640 | 160k | 53.8 | 54.6 | model |
Lawin-MaskFormer | Swin-L | 640x640 | 160k | 55.3 | 56.5 | model |
Name | Backbone | crop size |
lr sched |
mIoU | mIoU (ms+flip) |
download |
---|---|---|---|---|---|---|
Lawin-Mask2Former | Swin-T | 512x512 | 160k | 48.2 | 50.5 | model |
Lawin-Mask2Former | Swin-S | 512x512 | 160k | 52.1 | 53.7 | model |
Lawin-Mask2Former | Swin-B | 640x640 | 160k | 54.6 | 56.0 | model |
Lawin-Mask2Former | Swin-L | 640x640 | 160k | 56.5 | 57.8 | model |
Citing Lawin Transformer
@article{yan2022lawin,
title={Lawin transformer: Improving semantic segmentation transformer with multi-scale representations via large window attention},
author={Yan, Haotian and Zhang, Chuang and Wu, Ming},
journal={arXiv preprint arXiv:2201.01615},
year={2022}
}