biomedical icon indicating copy to clipboard operation
biomedical copied to clipboard

Mantra GSC new location (closes #891)

Open phlobo opened this issue 5 months ago • 0 comments

Closes #891

Mantra GSC was moved from the original website to GitHub: https://github.com/mi-erasmusmc/Mantra-Gold-Standard-Corpus/tree/main

This PR makes the loader point to the new URL and creates a HF Hub version of the existing loader script for mantra_gsc.

If the following information is NOT present in the issue, please populate:

  • Name: Mantra GSC
  • Description: Multi-lingual Mantra Gold Standard Corpus
  • Paper: https://academic.oup.com/jamia/article/22/5/948/930067
  • Data: https://github.com/mi-erasmusmc/Mantra-Gold-Standard-Corpus/tree/main

Checkbox

  • [x] Confirm that this PR is linked to the dataset issue.
  • [x] Create the dataloader script hub/hub_repos/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
  • [x] Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _BIGBIO_VERSION variables.
  • [x] Implement _info(), _split_generators() and _generate_examples() in dataloader script.
  • [x] Make sure that the BUILDER_CONFIGS class attribute is a list with at least one BigBioConfig for the source schema and one for a bigbio schema.
  • [x] Confirm dataloader script works with datasets.load_dataset function.
  • [x] Confirm that your dataloader script passes the test suite run with python -m tests.test_bigbio_hub <dataset_name> [--data_dir /path/to/local/data] --test_local.
  • [x] If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

phlobo avatar Mar 20 '24 14:03 phlobo