segmentation icon indicating copy to clipboard operation
segmentation copied to clipboard

COVID-19 Lung Segmentation

Authors Project Build Status License Code Quality Coverage
R. Biondi
N. Curti
COVID-19 Lung Segmentation status Windows : Windows CI
Ubuntu : Ubuntu CI
license Codacy : Codacy Badge
Codebeat : CODEBEAT
codecov

Project CI Docs CI

docs GitHub pull-requests GitHub issues

GitHub stars GitHub watchers

COVID-19 Lung Segmentation

This package allows to isolate the lung region and identify ground glass lesions on chest CT scans of patients affected by COVID-19. The segmentation approach is based on color quantization, performed by K-means clustering. This package provides a series of scripts to isolate lung regions, pre-process the images, estimate K-means centroids and labels of the lung regions.

  • COVID-19 Lung Segmentation
    • Overview
    • Contents
    • Prerequisites
    • Installation
      • Testing
    • Usage
      • Download Data
      • Single Scan
      • Multiple Scans
        • Script
          • Train your own centroid set
        • Snakemake
        • Train Your Centroids
      • Evaluation
    • License
    • Contribution
    • References
    • Authors
    • Acknowledgments
    • Citation

Overview

COronaVirus Disease (COVID-19) has widely spread all over the world since the beginning of 2020. It is acute, highly contagious, viral infection mainly involving the respiratory system. Chest CT scans of patients affected by this condition have shown peculiar patterns of Ground Glass Opacities (GGO) and Consolidation (CS) related to the severity and the stage of the disease.

In this scenario, the correct and fast identification of these patterns is a fundamental task. Up to now this task is performed mainly using manual or semi-automatic techniques, which are time-consuming (hours or days) and subjected to the operator expertise.

This project provides an automatic pipeline for the segmentation of GGO areas on chest CT scans of patient affected by COVID-19. The segmentation is achieved with a color quantization algorithm, based on k-means clustering, grouping voxel by color and texture similarity.

Example of segmentation. Left: Original image: Right original image with identified ground-glass areas.

The pipeline was tested on 15 labeled chest CT scans, manually segmented by expert radiologist. The goodness of the segmentation was estimated using Dice(0.67 ± 0.12), Sensitivity(0.666 ± 0.15), Specificity(0.9993 ± 0.0005) and Precision(0.75± 0.20) scores.

These results make the pipeline suitable as initialization for more accurate methods

Contents

COVID-19 Lung segmentation is composed of scripts and modules:

  • scripts allows to isolate lung regions, find the centroids for colour quantization and segment the images.
  • modules allows to load and save the images from and to different extensions and perform operations on image series.

To refer to script documentation:

Script Description
lung_extraction Extract lung from CT scans
train Apply colour quantization on a series of stacks to estimate the centroid to use for segmentation
labeling Segment the input image by using pre-estimated centroids or user-provided set
evaluate Compute metrics to evaluate the prediction agains a ground truth

To refer to modules documentation:

Module Description
utils method to load, save and preprocess stack
method method to filter the image tensor
segmentation contains useful function to segment stack of images and select ROI
metrics contains the implementation of the evaluation metrics

For each script described below, there are a PowerShell and a shell script that allows their execution on multiple patients scans. Moreover it also provide a snakemake pipeline.

Prerequisites

Supported python version: Python version. Also python 3.5, 3.6, 3.7 are supported but not tested.

First of all ensure to have the right python version installed.

This script use opencv-python, numpy and SimpleITK: see requirements for more informations.

The lung extraction is performed by using a pre-trained UNet, so please ensure to have installed the lungmask package. For more information about how the network is trained, please refer to https://doi.org/10.1186/s41747-020-00173-2.

:warning: The OpenCV requirement binds the minimum Python version of this project to Python 3.5!

To run the tests you need to install PyTest and Hypothesis. Installation instructions are available at: PyTest, Hypothesis

Installation

Download the project or the latest release:

git clone https://github.com/RiccardoBiondi/segmentation

Now you can install the package using pip:

pip install segmentation/

Testing

Testing routines use PyTest and Hypothesis packages. please install these packages to perform the test. o install the package in development mode you need to add also this requirement:

  • pytest >= 3.0.7

  • hypothesis >= 4.13.0

:warning: pytest versions above 6.1.2 are not supported by python 3.5

A full set of test is provided in testing directory. You can run the full list of test with:

python -m pytest

Usage

This modules provides some script to segment a single scan, to automate the segmentation for multiple patients and to train your centroid set. In the following paragraph, we will see how to use all the features. To achieve this purpose, we will use, as example, the public dataset COVID-19 CT Lung and Infection Segmentation Dataset, published by Zenodo[5].

Download Data

Firstly, we have to download and prepare the data. All the data will be stored and organized in a folder named Example.

Download data into the Examples folder

using Bash:

  $ mkdir Examples
  $ wget https://zenodo.org/record/3757476/files/COVID-19-CT-Seg_20cases.zip -P ./Examples
  $ unzip ./Examples/COVID-19-CT-Seg_20cases.zip -d ./Examples/COVID-19-CT

Or PowerShell:


  PS \> New-Item  -Path . -Name "Examples" -ItemType "directory"
  PS \> Start-BitsTransfer -Source https://zenodo.org/record/3757476/files/COVID-19-CT-Seg_20cases.zip -Destination .\Examples\
  PS \> Expand-Archive -LiteralPath .\Examples\COVID-19-CT-Seg_20cases.zip -DestinationPath .\Examples\COVID-19-CT -Force

Single Scan

Once you have download the data and installed the module, you can start to segment the images. Input CT scans must be in Hounsfield units(HU) since grey-scale images are not allowed. The input allowed formats are the ones supported by SimpleITK. If the input is a DICOM series, pass the path to the directory containing the series files. Please ensure that the folder contains only one series. As output will save the segmentation as nrrd.

To segment a single CT scan run the following from the bash or PowerShell:

   python -m CTLungSeg --input='./Examples/COVID-19-CT/coronacases_003.nii.gz'  --output='./Examples/coronacases_003_label.nrrd'

Multiple Scans

In the case of multiple patients segmentation, you have to repeat the segmentation process many times: We have automated this process using bash(for Linux) and PowerShell(for Windows) scripts. We have also provided a snakemake pipeline for the whole segmentation procedure in a multi-processing environment. In the following paragraph, we will explain how to organize your data to benefits from this automation.

Script

To run the scripts,, you have to organize the data into three folders:

  • input folder: contains all and only the CT scans to segment
  • temporary folder: empty folder. Will contain the scans after the lung segmentation
  • output folder: empty folder, will contain the labels files.

As examples we will segmenta the coronacases_002 and the coronacases_005 patients.

From bash:

  $ mkdir ./Examples/INPUT
  $ mkdir ./Examples/LUNG
  $ mkdir ./Examples/OUTPUT
  $ mv ./Examples/COVID-19-CT/coronacases_002.nii.gz ./Examples/COVID-19-CT/coronacases_005.nii.gz ./Examples/INPUT

or from PowerShell

  PS \> New-Item -Path "Examples" -Name "INPUT" -ItemType "directory"
  PS \> New-Item -Path "Examples" -Name "LUNG" -ItemType "directory"
  PS \> New-Item -Path "Examples" -Name "OUTPUT" -ItemType "directory"
  PS \> Move-Item -Path "Examples\COVID-19-CT\coronacases_002.nii.gz" -Destination "Examples\INPUT"
  PS \> Move-Item -Path "Examples\COVID-19-CT\coronacases_005.nii.gz" -Destination "Examples\INPUT"

Now you can proceed with the lung segmentation. To achieve this purpose run from PowerShell the script:

 PS \> ./lung_extraction.ps1 ./Examples/INPUT ./Examples/LUNG

Or its equivalent bash version:

  $ ./lung_extraction.sh./Examples/INPUT ./Examples/LUNG

Once you have successfully isolated the lung, you are ready to perform the GGO segmentation. Run the labelling scrip from PowerShell :

  PS /> ./labeling.ps1 ./Examples/LUNG ./Examples/OUTPUT

Or its corresponding bash version:

$ ./labeling.sh ./Examples/LUNG ./Examples/OUTPUT
Train your own centroid set

It is possible to train your centroid set instead of using the pre-trained one.

In this case you have to prepare these folders :

  • TRAIN : will contain the scans in the training set
  • TLUNG : will stores the scans after lung extraction

We will use coronaceses_003 and coronaceses_008 as training set.

From bash:

  $ mkdir ./Examples/TRAIN
  $ mkdir ./Examples/TLUNG
  $ mv ./Examples/COVID-19-CT/coronacases_003.nii.gz ./Examples/COVID-19-CT/coronacases_008.nii.gz ./Examples/TRAIN

or Powershell:

  PS \> New-Item -Path ".\Examples" -Name "TRAIN" -ItemType "directory"
  PS \> New-Item -Path ".\Examples" -Name "TLUNG" -ItemType "directory"
  PS \> Move-Item -Path ".\Examples\COVID-19-CT\coronacases_003.nii.gz" -Destination "Examples\TRAIN"
  PS \> Move-Item -Path ".\Examples\COVID-19-CT\coronacases_008.nii.gz" -Destination "Examples\TRAIN"

First of all, you have to perform the lung extraction on the train scans, as before run:

  $ ./lung_extraction.sh ./Examples/TRAIN/ ./Examples/TLUNG/

or its corresponding PowerShell version. Now, to estimate the centroid set, run:

  $ ./train.sh ./Examples/TLUNG/ ./centroid.pkl.npy

or its corresponding PowerShell version.

Snakemake

If you have not installed snakemake, you can find the instruction here. To use the snakemake pipeline, you have to create two folders:

  • INPUT : contains all and only the CT scans to segment
  • OUTPUT : empty folder, will contain the segmented scans as nrrd.

As before we will use as examples coronacases_002 and coronacases_005 patients

:notes: If you already run the script version, these folder are ready

Execute from bash

  $ mkdir ./Examples/INPUT
  $ mkdir ./Examples/OUTPUT
  $ mv ./Examples/COVID-19-CT/coronacases_002.nii.gz ./Examples/COVID-19-CT/coronacases_005.nii.gz ./Examples/INPUT

or PowerShell

  PS \> New-Item -Path "Examples" -Name "INPUT" -ItemType "directory"
  PS \> New-Item -Path "Examples" -Name "OUTPUT" -ItemType "directory"
  PS \> Move-Item -Path ".\Examples\COVID-19-CT\coronacases_002.nii.gz" -Destination "Examples\INPUT"
  PS \> Move-Item -Path ".\Examples\COVID-19-CT\coronacases_005.nii.gz" -Destination "Examples\INPUT"

Now, from command line, execute:

  snakemake --cores 1 --config input_path='./Examples/INPUT/'
  output_path='./Examples/OUTPUT/'

:notes: This command works both for Bash and Powershell

:warning: It will create a folder named LUNG inside the INPUT, which contains the results of the lung extraction step.

Train Your Centroids

As before, you can decide to train your centroid set. To achieve this purpose, using the snakemake pipeline, you have to prepare three folders :

  • INPUT: will contains all the scans to segment
  • OUTPUT: will contain the segmented scans
  • TRAIN: will contain all the scans of the training set. (NOTE Cannot be the INPUT folder)

:warning: INPUT and TRAIN folder cannot be the same

:notes: This will train the centroid set, and after that perform the segmentation on the scans in the input folder. So the INPUT folder is organized as before.

Now run Snakemake with the following configuration parameters :

  snakemake --cores 1 --config input_path='./Examples/INPUT/'
  output_path='.Examples/OUTPUT/' train_path='./Examples/TRAIN/' centroid_path='./Examples/centorids.pkl.npy'

Evaluation

This project provides also a script to evaluate the goodnes of the segmentation against the ground truth. The evaluation is carried out by different metrics: Dice Coefficient, Sensitivity, Recall, Precision and Accuracy. To run te evaluation procedure, run the following command from bash or PowerShell

   python -m CTLungSeg.evaluate --gt='/Path/To/GroundTruth.nii'  --pred='/Path/To/Prediction.nii'

This will print on the command line the achieved results. To store the results to a comma spaced csv file, use the following command from bash or PowerShell

   python -m CTLungSeg.evaluate --gt='/Path/To/GroundTruth.nii'  --pred='/Path/To/Prediction.nii' --output='/Path/To/Output.csv'

Notice that both ground truth and prdiction must have the same shape. The images will be evaluated as binary images with a background value of 0.

License

The COVID-19 Lung Segmentation package is licensed under the MIT "Expat" License. License

Contribution

Any contribution is more than welcome. Just fill an issue or a pull request and we will check ASAP!

See here for further informations about how to contribute with this project.

References

1- Hofmanninger, J., Prayer, F., Pan, J. et al. Automatic lung segmentation in routine imaging is primarily a data diversity problem, not a methodology problem. Eur Radiol Exp 4, 50 (2020). https://doi.org/10.1186/s41747-020-00173-2.
2- Bradski, G. (2000). The OpenCV Library. Dr. Dobb's Journal of Software Tools.
3- Yaniv, Z., Lowekamp, B.C., Johnson, H.J. et al. SimpleITK Image-Analysis Notebooks: a Collaborative Environment for Education and Reproducible Research. J Digit Imaging 31, 290–303 (2018). https://doi.org/10.1007/s10278-017-0037-8.
4- Lowekamp Bradley, Chen David, Ibanez Luis, Blezek Daniel The Design of SimpleITK Frontiers in Neuroinformatics 7, 45 (2013) https://www.frontiersin.org/article/10.3389/fninf.2013.00045.
5- Ma Jun, Ge Cheng, Wang Yixin, An Xingle, Gao Jiantao, Yu Ziqi, Zhang Minqing, Liu Xin, Deng Xueyuan, Cao Shucheng, Wei Hao, Mei Sen, Yang Xiaoyu, Nie Ziwei, Li Chen, Tian Lu, Zhu Yuntao, Zhu Qiongjie, Dong Guoqiang, & He Jian. (2020). COVID-19 CT Lung and Infection Segmentation Dataset (Verson 1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3757476.

Authors

See also the list of contributors GitHub contributors who participated to this project.

Acknowledgments

The authors acknowledge all the members of the Department of Radiology, IRCCS Azienda Ospedaliero-Universitaria di Bologna and the SIRM foundation, Italian Society of Medical and Interventional Radiology for the support in the development of the project and analysis of the data.

Citation

If you have found COVID-19 Lung Segmentation helpful in your research, please consider citing the original paper

@article{app11125438,
  author = {Biondi, Riccardo and Curti, Nico and Coppola, Francesca and Giampieri, Enrico and Vara, Giulio and Bartoletti, Michele and Cattabriga, Arrigo and Cocozza, Maria Adriana and Ciccarese, Federica and De Benedittis, Caterina and Cercenelli, Laura and Bortolani, Barbara and Marcelli, Emanuela and Pierotti, Luisa and Strigari, Lidia and Viale, Pierluigi and Golfieri, Rita and Castellani, Gastone},
  title = {Classification Performance for COVID Patient Prognosis from Automatic AI Segmentation—A Single-Center Study},
  journal = {Applied Sciences},
  volume = {11},
  year = {2021},
  number = {12},
  article-number = {5438},
  url = {https://www.mdpi.com/2076-3417/11/12/5438},
  issn = {2076-3417},
  doi = {10.3390/app11125438}
}

or just this project

@misc{COVID-19 Lung Segmentation,
  author = {Biondi, Riccardo and Curti, Nico and Giampieri, Enrico and Castellani, Gastone},
  title = {COVID-19 Lung Segmentation},
  year = {2020},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/RiccardoBiondi/segmentation}},
}