biomedical icon indicating copy to clipboard operation
biomedical copied to clipboard

Tools for curating biomedical training data for large-scale language modeling

Results 180 biomedical issues
Sort by recently updated
recently updated
newest added

## Adding a Dataset - **Name:** *MentSum* - **Description:** *A Resource for Exploring Summarization of Mental Health Online Posts* - **Task:** *Text to Text* - **Paper:** *[link to the dataset...

New Dataset

Sometimes it's more convenient to load files locally than to download them repeatedly (especially large files). If the `data_dir` argument is provided on instantiation, it should override the URL download...

enhancement

Hi @GullyBurns and @tonifuc3m! Thanks for contributing to the hackathon! Currently, we will only _acknowledge_ participants who have implemented < 3 datasets. Would you be willing to implement 2 more...

## Describe the bug Medmentions can have multiple entries with the same span/text/type, yet different normalisation terms ## Steps to reproduce the bug ```python def conflicting_normalisation(row): span_to_entity = {} for...

schema improvements

## Adding a Dataset - **Name:** *CASI (Clinical Abbreviation Sense Inventory)* - **Description:** *For our comprehensive sense inventory for clinical abbreviations and acronyms, a total of 440 most frequently used...

New Dataset

**Finished data loader for source schema only, because the Bigbio KB schema does not currently support all features that exist in the source data - per conversation with @jason-fries** -...

schema improvements

Extend #534. > Can we add in the `tests` case one for every sub function? This should be > - passage offsets > - entity offsets > - event offsets...

enhancement

@galtay this is the script (WIP) as a reference for parsing metadata from a static file

## Adding a Dataset - **Name:** *RuMedNLI* - **Description:** *There is a shortage of text medical resources for the Russian language. This is a substantial obstacle in state-of-the-art NLP deep...

DUA
English
NLI
Russian
New Dataset

Closes #427 Dataset contains 8 different subset_id's (different dataset settings), each with a `bigbio` and `source` schema. Furthermore there is an subset called `mediqa_ans_all` which includes all data (articles, sections,...