biomedical
biomedical copied to clipboard
Tools for curating biomedical training data for large-scale language modeling
Closes #502 This is a QnA dataset that supports two languages en and es, so there are two subsets containing the same questions: `head_qa_en` and `head_qa_es`. I implemented also a...
Please name your PR after the issue it closes. You can use the following line: "Closes #ISSUE-NUMBER" where you replace the ISSUE-NUMBER with the one corresponding to your dataset. If...
Closes #213
- **Name:** AIMed ### Checkbox - [x] Confirm that this PR is linked to the dataset issue. - [x] Create the dataloader script `biodatasets/my_dataset/my_dataset.py` (please use only lowercase and underscore...
## Adding a Dataset - **Name:** MedSTS - **Description:** 1,068 sentence pairs annotated by two medical experts with semantic similarity scores of 0-5 (low to high similarity). - **Task:** STS...
## Adding a Dataset - **Name:** *ShAReCLEF 2013 Task 2* - **Description:** *The dataset for Tasks 1 and 2 consists of de-identified clinical free-text notes from the MIMIC II database,...
## Adding a Dataset - **Name:** ShAReCLEF 2013 Task 1 - **Description:** *None provided* - **Task:** NER,NED - **Paper:** https://pubmed.ncbi.nlm.nih.gov/25147248/ - **Data:** https://physionet.org/content/shareclefehealth2013/1.0/ - **License:** DUA-NC
## Adding a Dataset - **Name:** n2c2 2014 - Deidentification & Heart Disease - **Description:** *None provided* - **Task:** NER,DOC_CLASS - **Paper:** https://pubmed.ncbi.nlm.nih.gov/26225918/ - **Data:** https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/ - **License:** DUA-C/NC
## Adding a Dataset - **Name:** *PlantNorm* - **Description:** *Named entity disambiguation dataset for plants from PubMed abstracts* - **Task:** *NER, NED* - **Paper:** [*A method for named entity normalization...
## Adding a Dataset - **Name:** *plant-disease* - **Description:** *Dataset with tagged plant/disease entities, as well as relations on how the plants affect the tagged diseases* - **Task:** *NER, RE*...
## Adding a Dataset - **Name:** *PPR Plant Phenotype Relation corpus* - **Description:** *Dataset with plant and phenotype mentions, as well as relations of how plants/plant extracts affect the phenotypes*...