notebooks
notebooks copied to clipboard
Interleaving Datasets Bug Fix
trafficstars
Currently, before interleaving the pubmed_dataset_streamed and law_dataset_streamed datasets, the meta feature isn't being dropped and that's why there's a misalignment in the meta feature resulting in the below error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[<ipython-input-26-544b4eed8cfe>](https://localhost:8080/#) in <module>
----> 1 combined_dataset = interleave_datasets([pubmed_dataset_streamed, law_dataset_streamed])
2 list(islice(combined_dataset, 2))
2 frames
[/usr/local/lib/python3.8/dist-packages/datasets/features/features.py](https://localhost:8080/#) in _check_if_features_can_be_aligned(features_list)
2052 for k, v in features.items():
2053 if not (isinstance(v, Value) and v.dtype == "null") and name2feature[k] != v:
-> 2054 raise ValueError(
2055 f'The features can\'t be aligned because the key {k} of features {features} has unexpected type - {v} (expected either {name2feature[k]} or Value("null").'
2056 )
ValueError: The features can't be aligned because the key meta of features {'meta': {'case_ID': Value(dtype='string', id=None), 'case_jurisdiction': Value(dtype='string', id=None), 'date_created': Value(dtype='string', id=None)}, 'text': Value(dtype='string', id=None)} has unexpected type - {'case_ID': Value(dtype='string', id=None), 'case_jurisdiction': Value(dtype='string', id=None), 'date_created': Value(dtype='string', id=None)} (expected either {'pmid': Value(dtype='int64', id=None), 'language': Value(dtype='string', id=None)} or Value("null").
What does this PR do?
drops the meta feature before the interleaving operation.
Fixes # (issue)
Who can review?
Feel free to tag members/contributors who may be interested in your PR.
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB