scikit-learn_bench icon indicating copy to clipboard operation
scikit-learn_bench copied to clipboard

Refactoring of benchmarks

Open Alexsandruss opened this issue 1 year ago • 13 comments

Benchmarks rework

Entry points: README, Developer Guide

Key features:

  • Benchmarks runner, report generator and separate benchmarks are implemented to run as python modules
  • Benchmarking config specification change
  • Report generator based on handling results as pd.DataFrame's to perform comparison operations and use openpyxl functionality to write dataframes natively

Solved issues

  • https://github.com/IntelPython/scikit-learn_bench/issues/12 (better reporting format with HW info and speedup output)
  • https://github.com/IntelPython/scikit-learn_bench/issues/75 (ensures sklearnex is used and outputs warning if not)
  • https://github.com/IntelPython/scikit-learn_bench/issues/82 (implements support for all classes with sklearn estimator API)
  • https://github.com/IntelPython/scikit-learn_bench/issues/120 (allows prefetching of datasets in parallel before running of benchmarks)
  • https://github.com/IntelPython/scikit-learn_bench/issues/131 (implements single row inference mode)

Alexsandruss avatar Mar 30 '23 10:03 Alexsandruss

@Alexsandruss Thank you for your hard work here!

I think this pull request is getting quite large and includes multiple features. It contains too much code to be reviewed. It's useful to have a way to split it into smaller tasks for your reviewers so that they can focus on specific portions of it. I am pretty sure that we already can merge some of features here into master, if we would have separate PRs for them.

Wouldn't it be better to move an already finished feature to a separate PR? In this case we can quickly and efficiently review and merge it.

samir-nasibli avatar May 19 '23 01:05 samir-nasibli

This CODEOWNERS file contains errors

napetrov avatar Jun 09 '23 14:06 napetrov

So far I am also running into an issue downloading datasets in 3.9 with urllib, specifically with the SSL certs. To get it to download I had to export

SSL_CERT_DIR=/etc/ssl/certs

To get it to properly download. Somehow this wasnt a problem with 3.10 ¯\(ツ)

icfaust avatar Jul 12 '23 11:07 icfaust

Also, some of the errors seen in the PR associated with KNN/sklearn in the CI checks have been solved with this PR: https://github.com/scikit-learn/scikit-learn/commit/f473d7e5bb4aaaf2bb292aa7b0c6195b470daf39

But likely a handling of Errors needs to occur to make sure that it will run through even though part of sklearn doesn't

icfaust avatar Jul 12 '23 11:07 icfaust

So far I am also running into an issue downloading datasets in 3.9 with urllib, specifically with the SSL certs. To get it to download I had to export

SSL_CERT_DIR=/etc/ssl/certs

To get it to properly download. Somehow this wasnt a problem with 3.10 ¯**(ツ)**/¯

It looks like problem of specific python environment. Python 3.9 from conda-forge works fine.

Alexsandruss avatar Jul 12 '23 15:07 Alexsandruss

Also, some of the errors seen in the PR associated with KNN/sklearn in the CI checks have been solved with this PR: scikit-learn/scikit-learn@f473d7e

But likely a handling of Errors needs to occur to make sure that it will run through even though part of sklearn doesn't

There is no external fix for this sklearn version, so previous version is pinned to dependencies from now.

Alexsandruss avatar Jul 12 '23 15:07 Alexsandruss

Naive question: could we catch Errors in such a way to allow for reports to be generated even in the case of a failure in a certain test case? Something like how its done in pytest, or would that be out of scope of this PR.

icfaust avatar Jul 21 '23 15:07 icfaust

This really really needs a usage guide and docs. Barrier to entry for testing/evaluating this PR is high.

icfaust avatar Nov 06 '23 12:11 icfaust

@Alexsandruss Please add TODO list into the description.

samir-nasibli avatar Dec 04 '23 11:12 samir-nasibli

Is there a sample ml_benchmarks CI job ran with this branch? Or an infra branch associated with the update? I imagine some refactoring of infra support would be needed.

ethanglaser avatar Dec 07 '23 16:12 ethanglaser

Are any TODOs from CodeFactor hits meant to be addressed in this PR? Or would they be addressed in subsequent PRs?

ethanglaser avatar Dec 07 '23 20:12 ethanglaser

Is there a sample ml_benchmarks CI job ran with this branch? Or an infra branch associated with the update? I imagine some refactoring of infra support would be needed.

FYI this is in progress but functional - infra PR #684

ethanglaser avatar Dec 15 '23 00:12 ethanglaser

@Alexsandruss is there additional work you are planning to do on this or is it fully ready for review/merge at this point?

ethanglaser avatar Jan 31 '24 21:01 ethanglaser

@Alexsandruss could you please resolve conflicts for report_generator/default_report_gen_config.json

samir-nasibli avatar May 03 '24 12:05 samir-nasibli

@Alexsandruss After discussions with others, for this PR, just limit everything to 3.10 so as to be able to merge it ASAP Please also change things related to private CI related to the python versions (so some of my previous comments can be disregarded short-term).

icfaust avatar Jun 25 '24 04:06 icfaust