biomedical icon indicating copy to clipboard operation
biomedical copied to clipboard

Tools for curating biomedical training data for large-scale language modeling

Results 180 biomedical issues
Sort by recently updated
recently updated
newest added

A first look at how to upload a file in this repo (`bigbiohub.py`) to every dataset repo we have in the hub. Uses the huggingface_hub python package https://huggingface.co/docs/huggingface_hub/index @jason-fries @hakunanatasha...

From https://s-baker.net/resource/cei/

CC BY 4.0
English
Topic Classification

Closes #891 Mantra GSC was moved from the original website to GitHub: https://github.com/mi-erasmusmc/Mantra-Gold-Standard-Corpus/tree/main This PR makes the loader point to the new URL and creates a HF Hub version of...

closes #914 ### Checkbox - [x] Confirm that this PR is linked to the dataset issue. - [x] Create the dataloader script `hub/hub_repos/my_dataset/my_dataset.py` (please use only lowercase and underscore for...

## Adding a Dataset - **Name:** *SympTEMIST* - **Description:** *BioCreative VIII SympTEMIST task https://temu.bsc.es/symptemist/* - **Task:** *NER, NED* - **Paper:** *https://zenodo.org/records/10104547* - **Data:** *https://doi.org/10.5281/zenodo.8223653* - **License:** *[Creative Commons Attribution 4.0...

### Closes 912 - [x] Confirm that this PR is linked to the dataset issue. - [x] Create the dataloader script `hub/hub_repos/my_dataset/my_dataset.py` (please use only lowercase and underscore for dataset...

## Adding a Dataset - **Name:** *SourceData NLP* - **Description:** *SourceData-NLP is a named entity recognition and entity linking/disambiguation dataset produced through the routine curation of papers during the publication...

## Describe the bug The links to the bc5cdr dataset are no longer valid. ## Steps to reproduce the bug ```python from datasets import load_dataset bc5_bigbio = load_dataset("bigbio/bc5cdr", "bc5cdr_source") bc5_bigbio...

bug

`datasets.load_dataset('bigbio/mantra_gsc')` does not work. There is an implementation of a loader though: https://github.com/bigscience-workshop/biomedical/tree/main/bigbio/biodatasets/mantra_gsc Was this implementation not migrated to the hub?

bug

The passages in the kb schema are empty lists. https://huggingface.co/datasets/bigbio/jnlpba/viewer/jnlpba_bigbio_kb/train

bug