biomedical icon indicating copy to clipboard operation
biomedical copied to clipboard

Closes #169

Open giyaseddin opened this issue 2 years ago • 7 comments

Checkbox

  • [x] Confirm that this PR is linked to the dataset issue.
  • [x] Create the dataloader script biodatasets/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
  • [x] Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _BIGBIO_VERSION variables.
  • [x] Implement _info(), _split_generators() and _generate_examples() in dataloader script.
  • [x] Make sure that the BUILDER_CONFIGS class attribute is a list with at least one BigBioConfig for the source schema and one for a bigbio schema.
  • [x] Confirm dataloader script works with datasets.load_dataset function.
  • [x] Confirm that your dataloader script passes the test suite run with python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py.
  • [x] If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

giyaseddin avatar Apr 10 '22 10:04 giyaseddin

This PR closes #169

giyaseddin avatar Apr 10 '22 10:04 giyaseddin

Thank you for checking out my comments @giyaseddin ! I am trying to inspect the dataset with:

[ins] In [7]: from datasets import load_dataset
ds = load_dataset("biodatasets/medquad/medquad.py", "medquad_source")

But I get this error:

    142         raise NotImplementedError("Only `source` and `bigbio_qa` schemas are implemented.")
    144     return datasets.DatasetInfo(
    145         description=_DESCRIPTION,
    146         features=features,
   (...)
    149         citation=_CITATION,
    150     )
--> 152 def _load_qa_from_xml(self, file_paths) -> List[dict[str, str | None]]:
    153     """
    154     This method traverses the whole list of the downloaded XML files and extracts Q&A pairs.
    155     Returns the extracted Q&As and the base directory of the dumped json file that contains them all.
    156     """
    157     assert len(file_paths)

TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

Could you please make sure we can load both source and bigbio w/o errors? Thank you!

sg-wbi avatar Apr 14 '22 08:04 sg-wbi

Hey @giyaseddin! Do you plan to work anymore on this?

regel-corpus avatar Apr 20 '22 07:04 regel-corpus

Hey @regel-corpus, I will push my last modifications ASAP.

giyaseddin avatar Apr 20 '22 08:04 giyaseddin

Could you please check the current if it downloads correctly @sg-wbi?

giyaseddin avatar May 17 '22 18:05 giyaseddin

Hi @giyaseddin, I pulled the latest code, and it seems like this error still occurs upon loading. Could you check again if you have fixed it in your updates?

Thank you for checking out my comments @giyaseddin ! I am trying to inspect the dataset with:

[ins] In [7]: from datasets import load_dataset
ds = load_dataset("biodatasets/medquad/medquad.py", "medquad_source")

But I get this error:

    142         raise NotImplementedError("Only `source` and `bigbio_qa` schemas are implemented.")
    144     return datasets.DatasetInfo(
    145         description=_DESCRIPTION,
    146         features=features,
   (...)
    149         citation=_CITATION,
    150     )
--> 152 def _load_qa_from_xml(self, file_paths) -> List[dict[str, str | None]]:
    153     """
    154     This method traverses the whole list of the downloaded XML files and extracts Q&A pairs.
    155     Returns the extracted Q&As and the base directory of the dumped json file that contains them all.
    156     """
    157     assert len(file_paths)

TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

Could you please make sure we can load both source and bigbio w/o errors? Thank you!

rosalinesway avatar May 17 '22 19:05 rosalinesway

hi @giyaseddin, thanks for putting the effort to continue working on this dataset. Would it be possible to pull the up-to-date master into your branch? There are some inconsistencies between your branch and master, which blocks running the unit tests. Thanks!

ruisi-su avatar May 28 '22 19:05 ruisi-su