scikit-learn_bench Download dataset scripts FIX + new option to download datasets from benchmark configs

Download dataset scripts FIX + new option to download datasets from benchmark configs

Open leriomaggio opened this issue 2 years ago • 1 comments

This PR includes changes to the datasets/load_datasets.py script and related documentation.

In more details: the previous version of the script had a bug that completely ignored any dataset name provided to the script via the -d option. (It downloaded all the datasets instead).

This has been fixed in this PR, along with some improvement to internal documentation (i.e. help message).

Moreover, a new and extra option has been also added to the script, namely -c, --configs.

This options override the manual selection of dataset to download by automatically extracting the names of required datasets from input configuration file(s).

This becomes particularly useful when preparing to run multiple benchmark experiments, downloading all the necessary datasets used.

A new README.md file has been added in the datasets package, inline with other packages included in the benchmark. This documentation file includes all the detailed instructions on how to run and use the load_datasets utility.

Also, a new section has been added to the main README.md file to highlight the new features.

Feb 16 '23 19:02 leriomaggio

Datasets loading will be changed soon as part of complete refactoring of benchmarks. --dummy-run or another runner argument will be used for dataset downloading only during run.

Mar 20 '23 21:03 Alexsandruss

Implemented with --prefetch-datasets flag in refactoring (https://github.com/IntelPython/scikit-learn_bench/pull/133).

Jul 26 '24 09:07 Alexsandruss

scikit-learn_bench scikit-learn_bench copied to clipboard

Download dataset scripts FIX + new option to download datasets from benchmark configs

scikit-learn_bench
scikit-learn_bench copied to clipboard