datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Flexible Loader

Open dipta007 opened this issue 9 months ago • 3 comments

Feature request

Can we have a utility function that will use load_from_disk when given the local path and load_dataset if given an HF dataset?

It can be something as simple as this one:

def load_hf_dataset(path_or_name):
    if os.path.exists(path_or_name):
        return load_from_disk(path_or_name)
    else:
        return load_dataset(path_or_name)

Motivation

This can be done inside the user codebase, too, but in my experience, it becomes repetitive code.

Your contribution

I can open a pull request.

dipta007 avatar Mar 09 '25 16:03 dipta007

Ideally save_to_disk should save in a format compatible with load_dataset, wdyt ?

lhoestq avatar Mar 13 '25 11:03 lhoestq

Ideally save_to_disk should save in a format compatible with load_dataset, wdyt ?

That would be perfect if not at least a flexible loader.

dipta007 avatar Mar 17 '25 20:03 dipta007

@lhoestq For now, you can use this small utility library: nanoml

from nanoml.data import load_dataset_flexible

I actively develop and maintain this utility library. Open to contributors. Please open issues, PR, or feature requests.

dipta007 avatar Mar 27 '25 23:03 dipta007