Add basic information for collections in TCIA tutorial
Suggestions from Kirby:
It would be helpful to allow people to obtain some of the basic info (patients/modalities/body parts) about the Collections via the notebook. This would let people get some high level info about datasets without having to leave the notebook.
This code could be added to address this request:
# list collections with patient counts, modalities and body parts examined
import requests
import json
collection_url = "https://services.cancerimagingarchive.net/nbia-api/services/v1/getCollectionValues"
collection_data = requests.get(collection_url).json()
for x in collection_data:
collectionName =x['Collection']
patient_url = "https://services.cancerimagingarchive.net/nbia-api/services/v1/getPatient?Collection="+collectionName
patients = requests.get(patient_url).json()
count_PatientIds = set(item['PatientId'] for item in patients)
modality_url = "https://services.cancerimagingarchive.net/nbia-api/services/v1/getModalityValues?Collection="+collectionName
modalities = requests.get(modality_url).json()
clean_modalities = set(item['Modality'] for item in modalities)
bodyPart_url = "https://services.cancerimagingarchive.net/nbia-api/services/v1/getBodyPartValues?Collection="+collectionName
bodyParts = requests.get(bodyPart_url).json()
clean_bodyParts = set()
for item in bodyParts:
if len(item):
clean_bodyParts.add(item['BodyPartExamined'] )
else: clean_bodyParts.add('Not Specified')
print(collectionName,'has',len(count_PatientIds),'patients,',clean_modalities,'modalities, and',clean_bodyParts,'anatomic entities')
Please make sure you comply with the citation and data usage policy of the dataset used in the notebook, see https://wiki.cancerimagingarchive.net/display/Public/QIN-PROSTATE-Repeatability#33948380fcf5b2d073de4a679a77aebfd9d37008.
Please update the "Brief Introduction" section to the following:
Summary
The Cancer Imaging Archive (TCIA) is a service which de-identifies and hosts a large publicly available archive of medical images of cancer. TCIA is funded by the Cancer Imaging Program (CIP), a part of the United States National Cancer Institute (NCI), and is managed by the Frederick National Laboratory for Cancer Research (FNLCR).
A full list of TCIA's Collections can be found at https://www.cancerimagingarchive.net/collections/. Documentation about TCIA's REST APIs can be found at https://wiki.cancerimagingarchive.net/x/NIIiAQ.
Acknowledgements
If you leverage this notebook or any TCIA datasets in your work please be sure to comply with the TCIA Data Usage Policy by citing the dataset and associated publications. You can find more information about citing the "QIN-Prostate-Repeatability" dataset used in this notebook at http://doi.org/10.7937/K9/TCIA.2018.MR1CKGND.
@kirbyju what I thought is that the notebook is an artifact of its own that currently is already using data from TCIA, and citations that satisfy data usage policy are appropriate to have at the bottom of the notebook. What do you think?
Yeah, I can see an argument for that. So maybe add a "References" section at the end of the notebook that spells all of those citations out? It seems like it would be a lot to include in the "Brief Introduction".
(just to mention that while the repo maintainers can definitely help update the contents, please feel free to create pull requests to propose textual changes, this is an open source project under the Apache License 2.0 https://github.com/Project-MONAI/tutorials/blob/main/LICENSE.)