biomedical icon indicating copy to clipboard operation
biomedical copied to clipboard

Tools for curating biomedical training data for large-scale language modeling

Results 180 biomedical issues
Sort by recently updated
recently updated
newest added

create a utility function that automatically normalizes on id to yield semantic labels

## Adding a Dataset - **Name:** TREC-2017 LiveQA - **Description:** *None provided* - **Task:** QA - **Paper:** https://trec.nist.gov/pubs/trec26/papers/Overview-QA.pdf - **Data:** https://github.com/abachaa/LiveQA_MedicalTask_TREC2017 - **License:** ?

XML
English
QA

From paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5029005/ Figure 2. They seem to have an identifier for plants.

bug

From the `README`: ``` 1) Protein-organism-residue triplet relations a) XML Tag : eg: O:SaccharomycesP:Vps4p (End13p), AAA-family ATPase;R:Vps4p-(K179A);A:ATP binding Entity Identifiers = O: Organism, P: Protein, R: Residue, A: Catalytic triad...

bug

From the paper ``` Concept Unique Identifiers (CUI) corre- sponding to French terms from the UMLS (Lindberg et al., 1993) for single or multi- word terms. For multi-word terms, the...

bug

From https://github.com/jayded/evidence-inference/blob/master/annotations/README.md: ``` Start Evidence: This column represents what index in the text that the “reasoning” from this row starts at (this is inclusive). End Evidence: This column represents what...

bug

``` File: euadr_corpus/16950808.txt Drug-Disorder True concept Tolerance 0 9 annotator1,annotator4,Computer,annotator5 ['umls/C0002218'] 0 Diseases & Disorders ``` This ` ['umls/C0002218']` seems to be a UMLS code. https://github.com/bigscience-workshop/biomedical/tree/master/bigbio/biodatasets/euadr

bug

From the title: "DisTEMIST corpus: detection and **normalization** of disease mentions in spanish clinical cases" And the website, https://temu.bsc.es/distemist/: ``` DISTEMIST-linking subtrack: requires automatically finding disease mentions in published clinical...

bug

This dataset has a "Metadata" folder with document keywords, allowing for document indexing, i.e. mulit-label "TEXT_CLASSIFICATION", but "_SUPPORTED_TASKS" has only NER and TRANSLATION. https://github.com/bigscience-workshop/biomedical/blob/master/bigbio/biodatasets/diann_iber_eval/diann_iber_eval.py

bug

CordNER annotations comes from ML models. This means it's like PubTator and should be flagged` in BigBioConfigHelpers`. @galtay https://github.com/bigscience-workshop/biomedical/tree/master/bigbio/biodatasets/cord_ner

bug