notebooks icon indicating copy to clipboard operation
notebooks copied to clipboard

Interleaving Datasets Bug Fix

Open Shamik-07 opened this issue 2 years ago • 1 comments
trafficstars

Currently, before interleaving the pubmed_dataset_streamed and law_dataset_streamed datasets, the meta feature isn't being dropped and that's why there's a misalignment in the meta feature resulting in the below error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-26-544b4eed8cfe>](https://localhost:8080/#) in <module>
----> 1 combined_dataset = interleave_datasets([pubmed_dataset_streamed, law_dataset_streamed])
      2 list(islice(combined_dataset, 2))

2 frames
[/usr/local/lib/python3.8/dist-packages/datasets/features/features.py](https://localhost:8080/#) in _check_if_features_can_be_aligned(features_list)
   2052         for k, v in features.items():
   2053             if not (isinstance(v, Value) and v.dtype == "null") and name2feature[k] != v:
-> 2054                 raise ValueError(
   2055                     f'The features can\'t be aligned because the key {k} of features {features} has unexpected type - {v} (expected either {name2feature[k]} or Value("null").'
   2056                 )

ValueError: The features can't be aligned because the key meta of features {'meta': {'case_ID': Value(dtype='string', id=None), 'case_jurisdiction': Value(dtype='string', id=None), 'date_created': Value(dtype='string', id=None)}, 'text': Value(dtype='string', id=None)} has unexpected type - {'case_ID': Value(dtype='string', id=None), 'case_jurisdiction': Value(dtype='string', id=None), 'date_created': Value(dtype='string', id=None)} (expected either {'pmid': Value(dtype='int64', id=None), 'language': Value(dtype='string', id=None)} or Value("null").

What does this PR do?

drops the meta feature before the interleaving operation.

Fixes # (issue)

Who can review?

Feel free to tag members/contributors who may be interested in your PR.

Shamik-07 avatar Jan 21 '23 07:01 Shamik-07

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB