datasets
datasets copied to clipboard
Flexible Loader
Feature request
Can we have a utility function that will use load_from_disk when given the local path and load_dataset if given an HF dataset?
It can be something as simple as this one:
def load_hf_dataset(path_or_name):
if os.path.exists(path_or_name):
return load_from_disk(path_or_name)
else:
return load_dataset(path_or_name)
Motivation
This can be done inside the user codebase, too, but in my experience, it becomes repetitive code.
Your contribution
I can open a pull request.
Ideally save_to_disk should save in a format compatible with load_dataset, wdyt ?
Ideally
save_to_diskshould save in a format compatible with load_dataset, wdyt ?
That would be perfect if not at least a flexible loader.
@lhoestq For now, you can use this small utility library: nanoml
from nanoml.data import load_dataset_flexible
I actively develop and maintain this utility library. Open to contributors. Please open issues, PR, or feature requests.