covid-chestxray-dataset icon indicating copy to clipboard operation
covid-chestxray-dataset copied to clipboard

Recommended datasets for transfer learning

Open oplatek opened this issue 4 years ago • 3 comments

Hi @ieee8023

thank you for maintaining this dataset!

I implemented a pytorch lightning wrapper for a DenseNet model for covid-chestxray-dataset.

It is kick of a Pytorch Lightning's community project which aims at to be covid19 detector (for educational purposes).

Can you recommend us datasets and strategies on how to use additional data

I have scanned https://arxiv.org/pdf/2002.02497.pdf (I will return to it). It seems that to solve the labeling differences and other dataset preparation differences quite a lot of domain expertise is needed. Any tips appreciated.

Kind regards

Ondra

PS: I was inspired by #15 PPS: My fork was merged to the PyTorchLightning community project PPPS: I believe that @borda already contacted you that we may use slack for longer discussions if needed. Link to the slack can be found at PL

oplatek avatar Mar 18 '20 00:03 oplatek

Not a recommendation, but here are some links to datasets that I found:

  • NIH chest X-ray data
    • https://nihcc.app.box.com/v/ChestXray-NIHCC/file/219760887468
    • 112k images
  • Kaggle copy of NIH dataset
    • https://www.kaggle.com/nih-chest-xrays/data
  • Kaggle dataset for pneumonia in children
    • https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia
    • 2538 bacterial, 1345 viral and 1349 normal
  • Stanford dataset
    • https://stanfordmlgroup.github.io/competitions/chexpert/
    • 224316 images from Stanford hospital, 16627 no finding
  • PadChest
    • http://bimcv.cipf.es/bimcv-projects/padchest/
    • 160k images from Valencia databank, many with manual labelling
  • NIH Tuberculosis collection
    • https://openi.nlm.nih.gov/faq#faq-tb-coll (Open the "[+] I have heard about the Tuberculosis collection.")
    • Montgomery County X-ray set: 80 normal, 58 abnormal
    • Shenzhen set: 340 normal, 275 abnormal
  • NIH open-i Indiana University collection:
    • https://openi.nlm.nih.gov/faq#faq-tb-coll (Open the "[+] Where can I get the Chest X-ray images in Open-i?")
    • 8121 images
  • MIMIC-CXR
    • https://physionet.org/content/mimic-cxr/2.0.0/
    • 377k DICOM images from BIDMC in Boston
    • Need to be a member of PhysioNet to access

I also found a list of public medical imaging data collections, not restricted to lung X-ray at https://www.radrounds.com/profiles/blogs/list-of-open-access-medical-imaging-datasets radrounds.com

pkienzle avatar Apr 03 '20 16:04 pkienzle

Hi Oplatek and Pkienzle, thank you so much for the question and the answer! Currently I'm trying to train a covid chest x-ray detection model. I have trained the classifier with Kaggle dataset. Now I want to train the classifier using covid, pneumonia, and normal data. I think I want to use covid data from this repository, but I'm having difficulty to find normal data. Are there any reference or suggestion?

Thank you so much!

ammarchalifah avatar Jul 13 '20 04:07 ammarchalifah

Please check out this paper for a transfer learning approach and tasks to work on (and the clinical workflows that could benefit from tools): http://arxiv.org/abs/2006.11988

Also check out this library for dataloaders for over 7 different datasets: https://github.com/mlmed/torchxrayvision

ieee8023 avatar Jul 13 '20 05:07 ieee8023