LabelAssemble
LabelAssemble copied to clipboard
[ISBI 2023] Official Implementation for Label-Assemble
Label-Assemble
We introduce a new initiative, "label-assemble", that assembles large-scale datasets from available data resources, instead of piling up data and labels from scratch for every task, overcoming the deficiency of the current annotation paradigms. We discover that learning from the classes in "negative examples" can better delimit the decision boundary of the class of interest. This discovery is the foundation of the "data, assemble" initiative, underlining the necessity of assembling multiple datasets with diverse (yet partial) labels. It also sheds new light on the computer-aided diagnosis of rare diseases and emerging pandemics, wherein "positive examples" are hard to collect, yet "negative examples" are relatively easier to assemble.

Paper
This repository will provide the official implementation of the following paper:
Label-Assemble: Leveraging Multiple Datasets with Partial Labels
Mintong Kang1, Yongyi Lu2, Alan L. Yuille2, and Zongwei Zhou2,*
1 Zhejiang University, 2 Johns Hopkins University
ISBI, 2023
paper | code | slides
Assembling Existing Labels from Public Datasets to Diagnose Novel Diseases: COVID-19 in Late 2019
Zengle Zhu1, Mintong Kang2, Alan L. Yuille3, and Zongwei Zhou3,*
1 Tongji University, 2 Zhejiang University, 3 Johns Hopkins University
Medical Imaging Meets NeurIPS, 2022
paper | code | slides
Dependencies
- Linux
- Python 3.6+
- pytorch 1.2+
Usage of Label-Assemble
1. Clone the repository
$ git clone https://github.com/MrGiovanni/LabelAssemble.git
2. Prepare the datasets
The datasets we mainly used are COVIDx and ChestX_ray14. So you can download the two datasets from COVIDx-CXR2 and NIH.
3. Train the model
When you have the dataset ready, you can train the model.
$ bash run.sh
Then you can reproduce our experiments. Note that you should change the parameters.
Train models from your own data
1. Implement Dataset Config
In config.py, there is a dict named CustomConfig.
CustomConfig = dict(
train_img_path = None,
val_img_path = None,
test_img_path = None,
train_file_path = None,
val_file_path = None,
test_file_path = None,
class_num = None,
class_filter = None,
using_num = None
)
What you need to do is implement these codes.
- train_img_path: the directory of train set.
- val_img_path: the directory of validation set.
- test_img_path: the directory of test set.
- train_file_path: the path of train file.
- val_file_path: the path of validation file.
- test_file_path: the path of test file.
- class_filter: classes that you need.
- using_num: the total numbel images that you want to use.
2. Implement Dataset
In datasets.py, there is a class named CustomDataset. It is subclass of BaseDataset. What you need to do is to implement one functions: parse_line. The input of parse_line is one line of the train/val/test file, and the output is image label and image path.
Remeber that this function must be implemented.
3. Modify Config
In config.py, you should change three variables: assemble_datasets, class_interests, target_source.
For example:
assemble_datasets = ['covidx', 'chestxray14']
This means that COVIDX and ChestXRay14 will be assembled.
class_interests = ['CovidPositive']
This means that classes of interest is CovidPositive.
target_source = [0, ]
This means that datasets' source we are interested in is 0. Moreover, our codes can contains 15 diseases:
Atelectasis
Cardiomegaly
Effusion
Infiltration
Mass
Nodule
Pneumonia
Pneumothorax
Consolidation
Edema
Emphysema
Fibrosis
Pleural_Thickening
Hernia
CovidPositive
If your dataset contains a new disease, you need to modify class_mapping.
Citation
@article{kang2021label,
title={Label-assemble: Leveraging multiple datasets with partial labels},
author={Kang, Mintong and Lu, Yongyi and Yuille, Alan L and Zhou, Zongwei},
journal={arXiv preprint arXiv:2109.12265},
year={2021}
}
@article{zhuassembling,
title={Assembling Existing Labels from Public Datasets to Diagnose Novel Diseases: COVID-19 in Late 2019},
author={Zhu, Zengle and Kang, Mintong and Yuille, Alan and Zhou, Zongwei}
journal={NeurIPS Workshop on Medical Imaging meets NeurIPS},
year={2022}
}