covid-chestxray-dataset
covid-chestxray-dataset copied to clipboard
Recommended datasets for transfer learning
Hi @ieee8023
thank you for maintaining this dataset!
I implemented a pytorch lightning wrapper for a DenseNet model for covid-chestxray-dataset.
It is kick of a Pytorch Lightning's community project which aims at to be covid19 detector (for educational purposes).
Can you recommend us datasets and strategies on how to use additional data
I have scanned https://arxiv.org/pdf/2002.02497.pdf (I will return to it). It seems that to solve the labeling differences and other dataset preparation differences quite a lot of domain expertise is needed. Any tips appreciated.
Kind regards
Ondra
PS: I was inspired by #15 PPS: My fork was merged to the PyTorchLightning community project PPPS: I believe that @borda already contacted you that we may use slack for longer discussions if needed. Link to the slack can be found at PL
Not a recommendation, but here are some links to datasets that I found:
- NIH chest X-ray data
- https://nihcc.app.box.com/v/ChestXray-NIHCC/file/219760887468
- 112k images
- Kaggle copy of NIH dataset
- https://www.kaggle.com/nih-chest-xrays/data
- Kaggle dataset for pneumonia in children
- https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia
- 2538 bacterial, 1345 viral and 1349 normal
- Stanford dataset
- https://stanfordmlgroup.github.io/competitions/chexpert/
- 224316 images from Stanford hospital, 16627 no finding
- PadChest
- http://bimcv.cipf.es/bimcv-projects/padchest/
- 160k images from Valencia databank, many with manual labelling
- NIH Tuberculosis collection
- https://openi.nlm.nih.gov/faq#faq-tb-coll (Open the "[+] I have heard about the Tuberculosis collection.")
- Montgomery County X-ray set: 80 normal, 58 abnormal
- Shenzhen set: 340 normal, 275 abnormal
- NIH open-i Indiana University collection:
- https://openi.nlm.nih.gov/faq#faq-tb-coll (Open the "[+] Where can I get the Chest X-ray images in Open-i?")
- 8121 images
- MIMIC-CXR
- https://physionet.org/content/mimic-cxr/2.0.0/
- 377k DICOM images from BIDMC in Boston
- Need to be a member of PhysioNet to access
I also found a list of public medical imaging data collections, not restricted to lung X-ray at https://www.radrounds.com/profiles/blogs/list-of-open-access-medical-imaging-datasets radrounds.com
Hi Oplatek and Pkienzle, thank you so much for the question and the answer! Currently I'm trying to train a covid chest x-ray detection model. I have trained the classifier with Kaggle dataset. Now I want to train the classifier using covid, pneumonia, and normal data. I think I want to use covid data from this repository, but I'm having difficulty to find normal data. Are there any reference or suggestion?
Thank you so much!
Please check out this paper for a transfer learning approach and tasks to work on (and the clinical workflows that could benefit from tools): http://arxiv.org/abs/2006.11988
Also check out this library for dataloaders for over 7 different datasets: https://github.com/mlmed/torchxrayvision