biomedical icon indicating copy to clipboard operation
biomedical copied to clipboard

Tools for curating biomedical training data for large-scale language modeling

Results 174 biomedical issues
Sort by recently updated
recently updated
newest added

Closes #501 ### Checkbox - [x] Confirm that this PR is linked to the dataset issue. - [x] Create the dataloader script `biodatasets/my_dataset/my_dataset.py` (please use only lowercase and underscore for...

Resolves #695 - Update DisTEMIST to final release used in shared task (5.1) - Included NED annotations from both entity linking training sets - Added configs for different subtracks (entities...

## Describe the bug Some entity offsets for bc5cdr are off by a few characters ## Steps to reproduce the bug ```python from bigbio.dataloader import BigBioConfigHelpers data = conhelps.for_config_name("bc5cdr_bigbio_kb").load_dataset() for...

bug

Add the plant-phenotype dataset. Closes #637 New location for dataset (that corrects errors of nested/missing entities): https://github.com/davidkartchner/PPRcorpus ### Checkbox - [x] Confirm that this PR is linked to the dataset...

Dear all, I am trying to add NED annotations for different datasets (e.g., [Quaero](https://github.com/bigscience-workshop/biomedical/issues/702) and [DisTEMIST](https://github.com/bigscience-workshop/biomedical/issues/695)), and I wonder how to use the `normalized` attribute when there are multiple concept...

## Describe the bug `bc7_litcovid` generates instances with out a `text` field, e.g., `{'id': '34', 'document_id': '34219343', 'text': None, 'labels': ['Prevention']}` ## Steps to reproduce the bug Iterate through the...

bug

[thomas2011](https://github.com/bigscience-workshop/biomedical/blob/master/bigbio/biodatasets/thomas2011/thomas2011.py) does not implement passages, only entities, in the KB schema.

bug

Hello, I noticed that this repo has `pmc_patients` for PMC-Patients Task 2: Patient-Patient Similarity (PPS), but there was no dataloader for PMC-Patients Task 1: Patient Note Recognition (PNR), so I...

### INTRO This introduces a metadata attribute `_TAGS`. The values in `_TAGS` are, well tags, to further classify the task. These tags are meant to be used together w/ `_SUPPORTED_TASKS`....

A number of these datasets (eg. `medal`) involves reading large CSV files. This can be considerably sped up if the dataset loader checks for the presence of a library like...