scikit-learn_bench icon indicating copy to clipboard operation
scikit-learn_bench copied to clipboard

Download dataset scripts FIX + new option to download datasets from benchmark configs

Open leriomaggio opened this issue 1 year ago • 1 comments

This PR includes changes to the datasets/load_datasets.py script and related documentation.

In more details: the previous version of the script had a bug that completely ignored any dataset name provided to the script via the -d option. (It downloaded all the datasets instead).

This has been fixed in this PR, along with some improvement to internal documentation (i.e. help message).

Moreover, a new and extra option has been also added to the script, namely -c, --configs.

This options override the manual selection of dataset to download by automatically extracting the names of required datasets from input configuration file(s).

This becomes particularly useful when preparing to run multiple benchmark experiments, downloading all the necessary datasets used.

A new README.md file has been added in the datasets package, inline with other packages included in the benchmark. This documentation file includes all the detailed instructions on how to run and use the load_datasets utility.

Also, a new section has been added to the main README.md file to highlight the new features.

leriomaggio avatar Feb 16 '23 19:02 leriomaggio