biomedical icon indicating copy to clipboard operation
biomedical copied to clipboard

Tools for curating biomedical training data for large-scale language modeling

Results 180 biomedical issues
Sort by recently updated
recently updated
newest added

``` In [3]: dsd = load_dataset('bigbio/biodatasets/psytar/psytar.py', name='psytar_bigbio_text', data_dir='/home/galtay/data/ ...: bigbio/psytar/PsyTAR_dataset.xlsx') Using custom data configuration psytar_bigbio_text-7247dd615c830efa Reusing dataset psy_tar_dataset (/home/galtay/.cache/huggingface/datasets/psy_tar_dataset/psytar_bigbio_text-7247dd615c830efa/1.0.0/149b2465b2445f8a388bc2f7af48f0d136d246f718f59743564f154ea3c2dfbf) 100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00

good first issue

What exactly are the differences between the gnormplus dataset and the biocreative II datasets (BC2GM & BC2GN) * https://biocreative.bioinformatics.udel.edu/resources/corpora/biocreative-ii-corpus/ * https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/gnormplus/ currently only gnormplus is implemented https://github.com/bigscience-workshop/biomedical/blob/master/bigbio/biodatasets/gnormplus/gnormplus.py but BLURB uses...

Datasets with k-fold definitions (e.g., GAD) are currently cumbersome to use. Maybe consider always enforcing train/dev/test splits, similar to what BLURB did for HoC and BIOSSES. `source` schema could preserve...

enhancement

From http://participants-area.bioasq.org/general_information/Task9b/

bug
Medium
DUA
English
QA
JSON

From https://physionet.org/content/mimic-iii-clinical-action/1.0.0/

Medium
DUA
English
Span / Sentence Classification
New Dataset

From http://www.geniaproject.org/genia-corpus/coreference

XML
Medium
CC BY 3.0
English
Coreference

From https://github.com/pubmedqa/pubmedqa

bug
Medium
MIT License
English
QA

From https://species.jensenlab.org

CoNLL
Medium
Public Domain (CC0)
English
NER

From https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/

Medium
DUA
English
RE
n2c2
New Dataset