GastroVision
GastroVision copied to clipboard
GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection https://drive.google.com/drive/folders/1T35gqO7jIKNxC-gVA2YVOMdsL7PSqeAa?usp=sharing
GastroVision
This repository provides related links and codes for the GastroVision dataset, a multi-class endoscopy image dataset comprising the most significant anatomical landmarks, pathological abnormalities, and normal findings in the gastrointestinal (GI) tract. Twenty-seven classes, with 8,000 images, are acquired from the upper and lower GI tracts. The classes include Accessory tools, Normal stomach, Duodenal bulb, Gastroesophageal junction, Normal z-line, Pylorus, Normal mucosa and vascular pattern (large bowel), Cecum, Appendiceal orifice, Ileocecal valve, Small bowl, Retroflex-rectum, Terminal ileum, Colon diverticula, Colorectal cancer, Resection margins, Dyed-lifted-polyp, Dyed-resection-margin. The significant classes that might be important in clinical settings include:
Key Classes to explore in the GastroVision dataset
- Angiectasia: Abnormal blood vessels causing recurrent GI bleeding.
- Barett’s Esophagus: Precancerous esophagus lining changes.
- Blood in Lumen: Signifier of GI bleeding.
- Colon Polyps: Potential early indicators of colorectal cancer.
- Esophagitis: Severity-graded inflammation of the esophagus.
- Esophageal Varices: Portal hypertension-induced swollen veins.
- Erythema: Indicative redness signifying inflammation.
- Gastric Polyps: Abnormal growths in the stomach lining.
- Ulcerative Colitis: Chronic inflammatory bowel disease.
- Ulcers: Open sores in stomach/duodenum.
The dataset can be downloaded using this link https://osf.io/84e7f/.
Alternatively, the dataset can also be downloaded using this link https://drive.google.com/drive/folders/1T35gqO7jIKNxC-gVA2YVOMdsL7PSqeAa?usp=sharing
data:image/s3,"s3://crabby-images/1405c/1405c2f5ccd7cf7746745ad1ee1de035f475e54d" alt=""
GastroVision key use cases
- Gastrointestinal Image Analysis: Aid in diagnosing gastrointestinal conditions and understanding anatomical landmarks.
- Deep Learning Research: Train models for classification tasks, especially in handling class imbalances.
- Rare Anomaly Detection: Utilize AI to detect less frequent but clinically significant anomalies.
- Benchmarking: A standard for evaluating and comparing medical image analysis algorithms.
- Transfer Learning: Act as a foundational dataset for pre-trained gastrointestinal tract disease detection models.
Note:
GastroVision provides labeled images, classified by medical experts, without ground truth or bounding box information.
- The metadata for the dataset can be found in GastroVision_metadata.csv, which contains the filename, class, width, height, and size of images.
- The dataset split used in the paper is provided in the Split folder. This split contains 22 classes, as we performed experiments using classes with more than five samples. However, you can get the details of other classes from the GastroVision_metadata.csv file. The users can use the provided split to reproduce the results presented in the paper and for a fair comparison.
- The code files used for the experiments reported in the paper are provided in the Source folder.
How to run
Train and Test:
python3 model_filename.py -e(epochs) -b(batch size) -l(learning rate)
Eg. python3 DenseNet121_pretrained.py -e50 -b32 -l0.0001
model_filename.py refers to "DenseNet-121_pretrained.py","DenseNet-169_pretrained.py", "EfficientNet-B0_pretrained.py", ResNet-50_pretrained.py","ResNet-50_endtoend.py" or "ResNet-152_pretrained.py" provided in the Source folder.
Set path:
In model_filename.py (line numbers: 49-52):
- You can set the path for Train, Validation, and Test folders (line numbers: 49-51).
- model_path (line number:52) can be changed to the folder path where you want to store your model's checkpoints.
Dataset Details
GastroVision, with two broad categories (upper GI and lower GI), covers 36 classes belonging to anatomical landmarks or pathological findings. Proper categorization of these classes can be visualized from the diagram given below.
data:image/s3,"s3://crabby-images/e0bd6/e0bd678d9425baf31e6bf2dab4b07a44b46022e2" alt=""
A detailed distribution of each class is represented in the graph below:
data:image/s3,"s3://crabby-images/e1f6a/e1f6a8b1539fb6dbe38f49241aee20811cb1ff4d" alt=""
data:image/s3,"s3://crabby-images/768a0/768a0e52b00db72e48742f757cc9c61ece66d292" alt=""
Evaluation Metrics
Standard multi-class classification metrics, such as Matthews Correlation Coefficient (MCC), micro and macro averages of recall/sensitivity, precision, and F1-score, can be used to validate the performance using our dataset.
Baseline results
Results for all classification experiments on the Gastrovision dataset.
Class-wise performance associated with the best outcome obtained using pre-trained DenseNet-121.
data:image/s3,"s3://crabby-images/1eab8/1eab824d25070fa5898c29ea8121d6c9d9835c3a" alt=""
data:image/s3,"s3://crabby-images/5f775/5f775db153da85377f5f9d155e6d260926efa191" alt=""
Cite
If you use this dataset in your research work, please cite the following paper:
@inproceedings{jha2023gastrovision,
title={GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection},
author={Debesh Jha*, Vanshali Sharma*, Neethi Dasu, Nikhil Kumar Tomar, Steven Hicks, M.K. Bhuyan, Pradip K. Das, Michael A. Riegler, P{\aa}l Halvorsen, Thomas de Lange, Ulas Bagci}
booktitle={ICML Workshop on Machine Learning for Multimodal Healthcare Data (ML4MHD 2023)},
year={2023}
}
Contact
Please contact [email protected], [email protected] and [email protected] if you have questions about the dataset and our research activities. We always welcome collaboration and joint research!