ClickSEG
ClickSEG copied to clipboard
ClickSEG: A Codebase for Click-Based Interactive Segmentation
Introduction
ClickSEG is codebase for click-based interactive segmentation developped on RITM codebase.
What's New?
Compared with the repo of RITM codebase, ClickSEG has following new features:
1. The official implementation for the following papers.
Conditional Diffusion for Interative Segmentation (ICCV2021) [Link]
FocalClick: Towards Practical Interactive Image Segmentation (CVPR2022) [Link]
2. More correct crop augmentation during training.
RITM codebase uses albumentations to crop and resize image-mask pairs for training. In this way, the crop size are fixed, which is not suitable for training on a combined dataset with variant image size; Besides, the NEAREST INTERPOLATION adopt in albumentations causes the mask to have 1 pixel bias towards bottom-right, which is harmful for the boundary details, especially for the Refiner of FocalClick.
Therefore, we re-write the augmentation, which is crucial for the final performance.
3. More backbones and more train/val data.
We add efficient backbones like MobileNets and PPLCNet. We trained all our models on COCO+LVIS dataset for the standard configuration. At the same time, we train them on a combinatory large dataset and provide the trained weight to facilitate academic research and industrial applications. The combinatory large dataset include 8 dataset with high quality annotations and Diversified scenes: COCO1, LVIS2, ADE20K3, MSRA10K4, DUT5, YoutubeVOS6, ThinObject7, HFlicker8.
1. Microsoft coco: Common objects in context
2. Lvis: A dataset for large vocabulary instance segmentation
3. Scene Parsing through ADE20K Dataset
4. Salient object detection: A benchmark
5. Learning to detect salient objects with image-level supervision
6. YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark
7. Deep Interactive Thin Object Selection
8. DoveNet: Deep Image Harmonization via Domain Verification
4. Dataset and evaluation code for starting from initial masks.
In the paper of FocalClick, we propose a new dataset of DAVIS-585 which provides initial masks for evaluation. The dataset could be download at ClickSEG GOOGLE DIRVIE. We also provide evaluation code in this codebase.
User Guidelines
To use this codebase to train/val your own models, please follow the steps:
- Install the requirements by excuting
pip install -r requirements.txt
-
Prepare the dataset and pretrained backbone weights following: Data_Weight_Preparation.md
-
Train or validate the model following: Train_Val_Guidance.md
Supported Methods
The trained model weights could be downloaded at ClickSEG GOOGLE DIRVIE
CDNet: Conditional Diffusion for Interative Segmentation (ICCV2021)
CONFIG
Input Size: 384 x 384
Previous Mask: No
Iterative Training: No
Train Dataset |
Model | GrabCut | Berkeley | Pascal VOC |
COCO MVal |
SBD | DAVIS | DAVIS585 from zero |
DAVIS585 from init |
---|---|---|---|---|---|---|---|---|---|
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
||
SBD | ResNet34 (89.72 MB) |
1.86/2.18 | 1.95/3.27 | 3.61/4.51 | 4.13/5.88 | 5.18/7.89 | 5.00/6.89 | 6.68/9.59 | 5.04/7.06 |
COCO+ LVIS |
ResNet34 (89.72 MB) |
1.40/1.52 | 1.47/2.06 | 2.74/3.30 | 2.51/3.88 | 4.30/7.04 | 4.27/5.56 | 4.86/7.37 | 4.21/5.92 |
FocalClick: Towards Practical Interactive Image Segmentation (CVPR2022)
CONFIG
S1 version: coarse segmentator input size 128x128; refiner input size 256x256.
S2 version: coarse segmentator input size 256x256; refiner input size 256x256.
Previous Mask: Yes
Iterative Training: Yes
Train Dataset |
Model | GrabCut | Berkeley | Pascal VOC |
COCO MVal |
SBD | DAVIS | DAVIS585 from zero |
DAVIS585 from init |
---|---|---|---|---|---|---|---|---|---|
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
||
COCO+ LVIS |
HRNet18s-S1 (16.58 MB) |
1.64/1.88 | 1.84/2.89 | 3.24/3.91 | 2.89/4.00 | 4.74/7.29 | 4.77/6.56 | 5.62/8.08 | 2.72/3.82 |
COCO+ LVIS |
HRNet18s-S2 (16.58 MB) |
1.48/1.62 | 1.60/2.23 | 2.93/3.46 | 2.61/3.59 | 4.43/6.79 | 3.90/5.23 | 4.87/6.87 | 2.47/3.30 |
COCO+ LVIS |
HRNet32-S2 (119.11 MB) |
1.64/1.80 | 1.70/2.36 | 2.80/3.35 | 2.62/3.65 | 4.24/6.61 | 4.01/5.39 | 4.77/6.84 | 2.32/3.09 |
Combined+ Dataset |
HRNet32-S2 (119.11 MB) |
1.30/1.34 | 1.49/1.85 | 2.84/3.38 | 2.80/3.85 | 4.35/6.61 | 3.19/4.81 | 4.80/6.63 | 2.37/3.26 |
COCO+ LVIS |
SegFormerB0-S1 (14.38 MB) |
1.60/1.86 | 2.05/3.29 | 3.54/4.22 | 3.08/4.21 | 4.98/7.60 | 5.13/7.42 | 6.21/9.06 | 2.63/3.69 |
COCO+ LVIS |
SegFormerB0-S2 (14.38 MB) |
1.40/1.66 | 1.59/2.27 | 2.97/3.52 | 2.65/3.59 | 4.56/6.86 | 4.04/5.49 | 5.01/7.22 | 2.21/3.08 |
COCO+ LVIS |
SegFormerB3-S2 (174.56 MB) |
1.44/1.50 | 1.55/1.92 | 2.46/2.88 | 2.32/3.12 | 3.53/5.59 | 3.61/4.90 | 4.06/5.89 | 2.00/2.76 |
Combined Datasets |
SegFormerB3-S2 (174.56 MB) |
1.22/1.26 | 1.35/1.48 | 2.54/2.96 | 2.51/3.33 | 3.70/5.84 | 2.92/4.52 | 3.98/5.75 | 1.98/2.72 |
Efficient Baselines using MobileNets and PPLCNets
CONFIG
Input Size: 384x384.
Previous Mask: Yes
Iterative Training: Yes
Train Dataset |
Model | GrabCut | Berkeley | Pascal VOC |
COCO MVal |
SBD | DAVIS | DAVIS585 from zero |
DAVIS585 from init |
---|---|---|---|---|---|---|---|---|---|
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
||
COCO+ LVIS |
MobileNetV2 (7.5 MB) |
1.82/2.02 | 1.95/2.69 | 2.97/3.61 | 2.74/3.73 | 4.44/6.75 | 3.65/5.81 | 5.25/7.28 | 2.15/3.04 |
COCO+ LVIS |
PPLCNet (11.92 MB) |
1.74/1.92 | 1.96/2.66 | 2.95/3.51 | 2.72/3.75 | 4.41/6.66 | 4.40/5.78 | 5.11/7.28 | 2.03/2.90 |
Combined Datasets |
MobileNetV2 (7.5 MB) |
1.50/1.62 | 1.62/2.25 | 3.00/3.61 | 2.80/3.96 | 4.66/7.05 | 3.59/5.24 | 5.05/7.12 | 2.06/2.97 |
Combined Datasets |
PPLCNet (11.92 MB) |
1.46/1.66 | 1.63/1.99 | 2.88/3.44 | 2.75/3.89 | 4.44/6.74 | 3.65/5.34 | 5.02/6.98 | 1.96/2.81 |
License
The code is released under the MIT License. It is a short, permissive software license. Basically, you can do whatever you want as long as you include the original copyright and license notice in any copy of the software/source.
Acknowledgement
The core framework of this codebase follows: https://github.com/saic-vul/ritm_interactive_segmentation
Some code and pretrained weights are brought from:
https://github.com/Tramac/Lightweight-Segmentation
https://github.com/facebookresearch/video-nonlocal-net
https://github.com/visinf/1-stage-wseg
https://github.com/frotms/PP-LCNet-Pytorch
We thank those authors for their great works.
Citation
If you find this work is useful for your research, please cite our papers:
@inproceedings{cdnet,
title={Conditional Diffusion for Interactive Segmentation},
author={Chen, Xi and Zhao, Zhiyan and Yu, Feiwu and Zhang, Yilei and Duan, Manni},
booktitle={ICCV},
year={2021}
}
@article{focalclick,
title={FocalClick: Towards Practical Interactive Image Segmentation},
author={Chen, Xi and Zhao, Zhiyan and Zhang, Yilei and Duan, Manni and Qi, Donglian and Zhao, Hengshuang},
booktitle={CVPR},
year={2022}
}