scikit-learn icon indicating copy to clipboard operation
scikit-learn copied to clipboard

ASV Add benchmarks for `PairwiseDistancesReductions`

Open jjerphan opened this issue 3 years ago • 4 comments

Reference Issues/PRs

Relates to #22587

What does this implement/fix? Explain your changes.

I think investing some time in coming up with a proper benchmarking suite for PairwiseDistancesReductions is sensible: I want preventing quick benchmark whose results might be misleading (this is want I have been doing until now thinking I could gain time but I in retrospective I think I've lost some).

Also, longer-term-wise, I want us, our future selves and future maintainers to be able to easily and confidently perform benchmark between revisions in case changes need to be performed.

Any other comments?

We might want to generate plots on top of asv results.

jjerphan avatar Aug 05 '22 09:08 jjerphan

Already very helpful it seems :)

Micky774 avatar Aug 05 '22 19:08 Micky774

Thanks, @Micky774. I hope this will help you with future experimentations.

jjerphan avatar Aug 05 '22 20:08 jjerphan

I think the original intention of the ASV benchmark suite was to continuously monitor user facing classes with a public API that remains stable across scikit-learn versions so as to detect performance regressions.

You're right. I think we should not pollute this ASV benchmark suite with this addition. Probably we can come up with another ASV benchmarks suite for private interfaces (in another repo)?

@jjerphan how long does it take to run the full cartesian products of parameters of this PR?

It can take quite a while for the datasets that are defined here. IIRC, hours on a 128 cores machines.

jjerphan avatar Aug 09 '22 14:08 jjerphan

You're right. I think we should not pollute this ASV benchmark suite with this addition. Probably we can come up with another ASV benchmarks suite for private interfaces (in another repo)?

Or maybe a private/ subfolder under asv_benchmarks/ if the generic CRON job can be configured to skip the private benchmarks to avoid making them run slower.

It can take quite a while for the datasets that are defined here. IIRC, hours on a 128 cores machines.

We should definitely not run this in the weekly benchmark run that does not have this many cores.

ogrisel avatar Aug 09 '22 15:08 ogrisel

I am coming back to this issue because I need a proper benchmark suite for all the PairwiseDistancesReductions for several pull requests.

Yet, it feels like it would be better to have another distinct project for such a suite because this is rather niche and we probably do not want to maintain it as part of the scikit-learn repository.

What do you think?

In the meantime, I will turn this PR to draft and will maintain its branch on my fork based on my needs and will update when needed.

jjerphan avatar Oct 21 '22 09:10 jjerphan

Closing in favor of https://github.com/jjerphan/pairwise-distances-reductions-asv-suite.

jjerphan avatar Dec 05 '22 10:12 jjerphan