biomedical icon indicating copy to clipboard operation
biomedical copied to clipboard

Create dataloader for PMC-Patients Task 1: Patient Note Recognition (PNR)

Open holylovenia opened this issue 1 year ago • 0 comments

Hello, I noticed that this repo has pmc_patients for PMC-Patients Task 2: Patient-Patient Similarity (PPS), but there was no dataloader for PMC-Patients Task 1: Patient Note Recognition (PNR), so I created this pull request for this addition. I also don't know if it's best to merge this addition to the previous dataloader (pmc_patients) or not, so for now I make this as a separate dataloader.

Regarding the dataloader schema, since the PMC-Patients PNR is not suitable for all the schemas that have been provided here, I followed @galtay's recommendation (via @SamuelCahyawijaya; thanks for relaying the info to me) to implement the source schema only and leave the _SUPPORTED_TASKS empty.

Please let me know if there's anything I can help.

Checkbox

  • [ ] Confirm that this PR is linked to the dataset issue.
  • [x] Create the dataloader script biodatasets/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
  • [x] Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _BIGBIO_VERSION variables.
  • [x] Implement _info(), _split_generators() and _generate_examples() in dataloader script.
  • [ ] Make sure that the BUILDER_CONFIGS class attribute is a list with at least one BigBioConfig for the source schema and one for a bigbio schema.
  • [x] Confirm dataloader script works with datasets.load_dataset function.
  • [x] Confirm that your dataloader script passes the test suite run with python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py.
  • [ ] If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

holylovenia avatar Jul 08 '22 09:07 holylovenia