biomedical issues

psytar schema is not implemented correctly

``` In [3]: dsd = load_dataset('bigbio/biodatasets/psytar/psytar.py', name='psytar_bigbio_text', data_dir='/home/galtay/data/ ...: bigbio/psytar/PsyTAR_dataset.xlsx') Using custom data configuration psytar_bigbio_text-7247dd615c830efa Reusing dataset psy_tar_dataset (/home/galtay/.cache/huggingface/datasets/psy_tar_dataset/psytar_bigbio_text-7247dd615c830efa/1.0.0/149b2465b2445f8a388bc2f7af48f0d136d246f718f59743564f154ea3c2dfbf) 100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00

galtay

good first issue

change default branch from master to main

galtay

bc2g[mn] vs gnormplus

What exactly are the differences between the gnormplus dataset and the biocreative II datasets (BC2GM & BC2GN) * https://biocreative.bioinformatics.udel.edu/resources/corpora/biocreative-ii-corpus/ * https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/gnormplus/ currently only gnormplus is implemented https://github.com/bigscience-workshop/biomedical/blob/master/bigbio/biodatasets/gnormplus/gnormplus.py but BLURB uses...

galtay

Consider enforcing canonical train/dev/test splits for bigbio schema

Datasets with k-fold definitions (e.g., GAD) are currently cumbersome to use. Maybe consider always enforcing train/dev/test splits, similar to what BLURB did for HoC and BIOSSES. `source` schema could preserve...

jason-fries