scikit-learn_bench
scikit-learn_bench copied to clipboard
Download dataset scripts FIX + new option to download datasets from benchmark configs
This PR includes changes to the datasets/load_datasets.py
script and related documentation.
In more details: the previous version of the script had a bug that completely ignored any dataset name provided to the script via the -d
option. (It downloaded all the datasets instead).
This has been fixed in this PR, along with some improvement to internal documentation (i.e. help message).
Moreover, a new and extra option has been also added to the script, namely -c
, --configs
.
This options override the manual selection of dataset to download by automatically extracting the names of required datasets from input configuration file(s).
This becomes particularly useful when preparing to run multiple benchmark experiments, downloading all the necessary datasets used.
A new README.md
file has been added in the datasets
package, inline with other packages included in the benchmark.
This documentation file includes all the detailed instructions on how to run and use the load_datasets
utility.
Also, a new section has been added to the main README.md
file to highlight the new features.
Datasets loading will be changed soon as part of complete refactoring of benchmarks. --dummy-run
or another runner argument will be used for dataset downloading only during run.
Implemented with --prefetch-datasets
flag in refactoring (https://github.com/IntelPython/scikit-learn_bench/pull/133).