scikit-learn-intelex icon indicating copy to clipboard operation
scikit-learn-intelex copied to clipboard

ENH: `BasicStatistics` API change

Open olegkkruglov opened this issue 1 year ago • 15 comments

Changes proposed in this pull request:

  • Refactored basic_statistics.cpp file
  • Changed API for BasicStatistics class to save results to class attribute instead of returning dict with result to be consistent with IncrementalBasicStatistics and other estimators. Saved attributes are arrays if input data is 2d and numbers if input data is 1d (like it is done in numpy)
  • Updated tests
  • Added support of n_jobs and usm_ndarray to BasicStatistics
  • Rename compute_raw method to _compute_raw since it is private and designed for internal use

olegkkruglov avatar Jan 10 '24 15:01 olegkkruglov

/intelci: run

olegkkruglov avatar Jan 21 '24 20:01 olegkkruglov

I'm wondering if BasicStatistics and IncrementalBasicStatistics shouldn't be two functions, possibly located in 'metrics' which returns a dict. @KulikovNikita @olegkkruglov @samir-nasibli

icfaust avatar Jan 30 '24 13:01 icfaust

/intelci: run

olegkkruglov avatar Apr 18 '24 11:04 olegkkruglov

/intelci: run

olegkkruglov avatar Apr 19 '24 13:04 olegkkruglov

/intelci: run

olegkkruglov avatar Apr 30 '24 14:04 olegkkruglov

This would drop basic statistics compute completely and break API - I guess we would need to deprecate it first?

ethanglaser avatar Apr 30 '24 15:04 ethanglaser

This would drop basic statistics compute completely and break API - I guess we would need to deprecate it first?

I would say that it is not breaking but more like fixing and making more consistent. As far as I understand, we have to use fit instead of compute in all estimators and compute was used here accidentally. But if we still need to use deprecation in such case then do we have any guidelines about how should it be done?

olegkkruglov avatar May 02 '24 10:05 olegkkruglov

This would drop basic statistics compute completely and break API - I guess we would need to deprecate it first?

I would say that it is not breaking but more like fixing and making more consistent. As far as I understand, we have to use fit instead of compute in all estimators and compute was used here accidentally. But if we still need to use deprecation in such case then do we have any guidelines about how should it be done?

Agreed that it is fixing, but that doesn't mean its not breaking - anyone that is using our BasicStatistics API will be impacted by this change. @Alexsandruss how do we generally handle this? (ie should we add deprecation warning message but leave compute() and just call fit() from it initially or can we remove compute() immediately)

I would also expect that this may require an accompanying docs update, spmd interfaces, etc.

ethanglaser avatar May 02 '24 14:05 ethanglaser

This would drop basic statistics compute completely and break API - I guess we would need to deprecate it first?

I would say that it is not breaking but more like fixing and making more consistent. As far as I understand, we have to use fit instead of compute in all estimators and compute was used here accidentally. But if we still need to use deprecation in such case then do we have any guidelines about how should it be done?

Agreed that it is fixing, but that doesn't mean its not breaking - anyone that is using our BasicStatistics API will be impacted by this change. @Alexsandruss how do we generally handle this? (ie should we add deprecation warning message but leave compute() and just call fit() from it initially or can we remove compute() immediately)

I would also expect that this may require an accompanying docs update, spmd interfaces, etc.

We can leave compute as well with deprecation warning.

samir-nasibli avatar May 03 '24 11:05 samir-nasibli

I'm wondering if BasicStatistics and IncrementalBasicStatistics shouldn't be two functions, possibly located in 'metrics' which returns a dict. @KulikovNikita @olegkkruglov @samir-nasibli

I think this is good idea to discuss. In case if we will accept this, deprecation warnings also needed here. As an option we can also add function for BS in metrics, just reusing original BS. Adding just function for IncrementalBasicStatistics idea will be harder.

Personally I am good with current solution as well.

samir-nasibli avatar May 03 '24 12:05 samir-nasibli

I would also expect that this may require an accompanying docs update, spmd interfaces, etc.

@ethanglaser deprecation is added, spmd part is updated, but I can't see any docs containing information about old interfaces which should be updated according to these changes. Am I missing something?

olegkkruglov avatar May 07 '24 13:05 olegkkruglov

/intelci: run

olegkkruglov avatar May 10 '24 13:05 olegkkruglov

@olegkkruglov please rebase you branch

samir-nasibli avatar Jun 13 '24 13:06 samir-nasibli

/intelci: run

olegkkruglov avatar Jun 17 '24 12:06 olegkkruglov

/intelci: run

Vika-F avatar Jun 26 '24 14:06 Vika-F

@olegkkruglov please rebase your branch and run internal CI as well. @icfaust , @ethanglaser @md-shafiul-alam @Alexsandruss Dear reviewers, expecting your approvals. If the comments are addressed please mark them as resolved.

samir-nasibli avatar Jul 04 '24 14:07 samir-nasibli

@olegkkruglov by the way I see that there is not information about online algos in our docs, that should be addressed.

This is not the scope of the current PR, just wanted to highlight.

samir-nasibli avatar Jul 05 '24 10:07 samir-nasibli

/intelci: run

olegkkruglov avatar Jul 05 '24 13:07 olegkkruglov