LabelAssemble icon indicating copy to clipboard operation
LabelAssemble copied to clipboard

[ISBI 2023] Official Implementation for Label-Assemble

Label-Assemble

We introduce a new initiative, "label-assemble", that assembles large-scale datasets from available data resources, instead of piling up data and labels from scratch for every task, overcoming the deficiency of the current annotation paradigms. We discover that learning from the classes in "negative examples" can better delimit the decision boundary of the class of interest. This discovery is the foundation of the "data, assemble" initiative, underlining the necessity of assembling multiple datasets with diverse (yet partial) labels. It also sheds new light on the computer-aided diagnosis of rare diseases and emerging pandemics, wherein "positive examples" are hard to collect, yet "negative examples" are relatively easier to assemble.

Paper

This repository will provide the official implementation of the following paper:

Label-Assemble: Leveraging Multiple Datasets with Partial Labels
Mintong Kang1, Yongyi Lu2, Alan L. Yuille2, and Zongwei Zhou2,*
1 Zhejiang University, 2 Johns Hopkins University
ISBI, 2023
paper | code | slides

Assembling Existing Labels from Public Datasets to Diagnose Novel Diseases: COVID-19 in Late 2019
Zengle Zhu1, Mintong Kang2, Alan L. Yuille3, and Zongwei Zhou3,*
1 Tongji University, 2 Zhejiang University, 3 Johns Hopkins University
Medical Imaging Meets NeurIPS, 2022
paper | code | slides

Dependencies

  • Linux
  • Python 3.6+
  • pytorch 1.2+

Usage of Label-Assemble

1. Clone the repository

$ git clone https://github.com/MrGiovanni/LabelAssemble.git

2. Prepare the datasets

The datasets we mainly used are COVIDx and ChestX_ray14. So you can download the two datasets from COVIDx-CXR2 and NIH.

3. Train the model

When you have the dataset ready, you can train the model.

$ bash run.sh

Then you can reproduce our experiments. Note that you should change the parameters.

Train models from your own data

1. Implement Dataset Config

In config.py, there is a dict named CustomConfig.

CustomConfig = dict(
    train_img_path = None,
    val_img_path = None,
    test_img_path = None,
    train_file_path = None,
    val_file_path = None,
    test_file_path = None,
    class_num = None,
    class_filter = None,
    using_num = None
)

What you need to do is implement these codes.

  • train_img_path: the directory of train set.
  • val_img_path: the directory of validation set.
  • test_img_path: the directory of test set.
  • train_file_path: the path of train file.
  • val_file_path: the path of validation file.
  • test_file_path: the path of test file.
  • class_filter: classes that you need.
  • using_num: the total numbel images that you want to use.

2. Implement Dataset

In datasets.py, there is a class named CustomDataset. It is subclass of BaseDataset. What you need to do is to implement one functions: parse_line. The input of parse_line is one line of the train/val/test file, and the output is image label and image path. Remeber that this function must be implemented.

3. Modify Config

In config.py, you should change three variables: assemble_datasets, class_interests, target_source. For example:

assemble_datasets = ['covidx', 'chestxray14']

This means that COVIDX and ChestXRay14 will be assembled.

class_interests = ['CovidPositive']

This means that classes of interest is CovidPositive.

target_source = [0, ]

This means that datasets' source we are interested in is 0. Moreover, our codes can contains 15 diseases:

 Atelectasis
 Cardiomegaly
 Effusion
 Infiltration
 Mass
 Nodule           
 Pneumonia
 Pneumothorax
 Consolidation
 Edema                  
 Emphysema
 Fibrosis
 Pleural_Thickening
 Hernia
 CovidPositive

If your dataset contains a new disease, you need to modify class_mapping.

Citation

@article{kang2021label,
  title={Label-assemble: Leveraging multiple datasets with partial labels},
  author={Kang, Mintong and Lu, Yongyi and Yuille, Alan L and Zhou, Zongwei},
  journal={arXiv preprint arXiv:2109.12265},
  year={2021}
}

@article{zhuassembling,
  title={Assembling Existing Labels from Public Datasets to Diagnose Novel Diseases: COVID-19 in Late 2019},
  author={Zhu, Zengle and Kang, Mintong and Yuille, Alan and Zhou, Zongwei}
  journal={NeurIPS Workshop on Medical Imaging meets NeurIPS},
  year={2022}
}