scikit-learn_bench
scikit-learn_bench copied to clipboard
Refactoring of benchmarks
Benchmarks rework
Entry points: README, Developer Guide
Key features:
- Benchmarks runner, report generator and separate benchmarks are implemented to run as python modules
- Benchmarking config specification change
- Report generator based on handling results as
pd.DataFrame
's to perform comparison operations and use openpyxl functionality to write dataframes natively
Solved issues
- https://github.com/IntelPython/scikit-learn_bench/issues/12 (better reporting format with HW info and speedup output)
- https://github.com/IntelPython/scikit-learn_bench/issues/75 (ensures sklearnex is used and outputs warning if not)
- https://github.com/IntelPython/scikit-learn_bench/issues/82 (implements support for all classes with sklearn estimator API)
- https://github.com/IntelPython/scikit-learn_bench/issues/120 (allows prefetching of datasets in parallel before running of benchmarks)
- https://github.com/IntelPython/scikit-learn_bench/issues/131 (implements single row inference mode)
@Alexsandruss Thank you for your hard work here!
I think this pull request is getting quite large and includes multiple features. It contains too much code to be reviewed. It's useful to have a way to split it into smaller tasks for your reviewers so that they can focus on specific portions of it. I am pretty sure that we already can merge some of features here into master, if we would have separate PRs for them.
Wouldn't it be better to move an already finished feature to a separate PR? In this case we can quickly and efficiently review and merge it.
This CODEOWNERS file contains errors
So far I am also running into an issue downloading datasets in 3.9 with urllib, specifically with the SSL certs. To get it to download I had to export
SSL_CERT_DIR=/etc/ssl/certs
To get it to properly download. Somehow this wasnt a problem with 3.10 ¯\(ツ)/¯
Also, some of the errors seen in the PR associated with KNN/sklearn in the CI checks have been solved with this PR: https://github.com/scikit-learn/scikit-learn/commit/f473d7e5bb4aaaf2bb292aa7b0c6195b470daf39
But likely a handling of Errors needs to occur to make sure that it will run through even though part of sklearn doesn't
So far I am also running into an issue downloading datasets in 3.9 with urllib, specifically with the SSL certs. To get it to download I had to export
SSL_CERT_DIR=/etc/ssl/certs
To get it to properly download. Somehow this wasnt a problem with 3.10 ¯**(ツ)**/¯
It looks like problem of specific python environment. Python 3.9 from conda-forge works fine.
Also, some of the errors seen in the PR associated with KNN/sklearn in the CI checks have been solved with this PR: scikit-learn/scikit-learn@f473d7e
But likely a handling of Errors needs to occur to make sure that it will run through even though part of sklearn doesn't
There is no external fix for this sklearn version, so previous version is pinned to dependencies from now.
Naive question: could we catch Errors in such a way to allow for reports to be generated even in the case of a failure in a certain test case? Something like how its done in pytest, or would that be out of scope of this PR.
This really really needs a usage guide and docs. Barrier to entry for testing/evaluating this PR is high.
@Alexsandruss Please add TODO list into the description.
Is there a sample ml_benchmarks CI job ran with this branch? Or an infra branch associated with the update? I imagine some refactoring of infra support would be needed.
Are any TODOs from CodeFactor hits meant to be addressed in this PR? Or would they be addressed in subsequent PRs?
Is there a sample ml_benchmarks CI job ran with this branch? Or an infra branch associated with the update? I imagine some refactoring of infra support would be needed.
FYI this is in progress but functional - infra PR #684
@Alexsandruss is there additional work you are planning to do on this or is it fully ready for review/merge at this point?
@Alexsandruss could you please resolve conflicts for report_generator/default_report_gen_config.json
@Alexsandruss After discussions with others, for this PR, just limit everything to 3.10 so as to be able to merge it ASAP Please also change things related to private CI related to the python versions (so some of my previous comments can be disregarded short-term).