PubMedCLIP
PubMedCLIP copied to clipboard
Fine-tuning CLIP using ROCO dataset which contains image-caption pairs from PubMed articles.
PubMedCLIP in Medical Visual Question Answering
This repository includes PubMedCLIP, the fine-tuned version of CLIP with ROCO image--caption pairs. We also provide the pipelines for encorporating PubMedCLIP as the alternative pre-trained visual encoder in MEVF and QCR medical visual question answering pipelines. Our experiments illustrate that PubMedCLIP results in up tp 3% improvement in the medical visual question answering.
Citation
If you use this work in academic publication, please cite the arXiv paper by Sedigheh Eslami, Gerard de Melo, and Christoph Meinel:
Sedigheh Eslami, Gerard de Melo, Christoph Meinel (2021).
Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?
arXiv e-prints 2112.13906, 2021.
BibTeX entry:
@inproceedings{eslami2023pubmedclip,
title={PubMedCLIP: How Much Does CLIP Benefit Visual Question Answering in the Medical Domain?},
author={Eslami, Sedigheh and Meinel, Christoph and De Melo, Gerard},
booktitle={Findings of the Association for Computational Linguistics: EACL 2023},
pages={1151--1163},
year={2023}
}