scikit-learn-intelex
scikit-learn-intelex copied to clipboard
ENH: `BasicStatistics` API change
Changes proposed in this pull request:
- Refactored basic_statistics.cpp file
- Changed API for BasicStatistics class to save results to class attribute instead of returning dict with result to be consistent with IncrementalBasicStatistics and other estimators. Saved attributes are arrays if input data is 2d and numbers if input data is 1d (like it is done in numpy)
- Updated tests
- Added support of n_jobs and usm_ndarray to BasicStatistics
- Rename compute_raw method to _compute_raw since it is private and designed for internal use
/intelci: run
I'm wondering if BasicStatistics and IncrementalBasicStatistics shouldn't be two functions, possibly located in 'metrics' which returns a dict. @KulikovNikita @olegkkruglov @samir-nasibli
/intelci: run
/intelci: run
/intelci: run
This would drop basic statistics compute completely and break API - I guess we would need to deprecate it first?
This would drop basic statistics compute completely and break API - I guess we would need to deprecate it first?
I would say that it is not breaking but more like fixing and making more consistent. As far as I understand, we have to use fit instead of compute in all estimators and compute was used here accidentally. But if we still need to use deprecation in such case then do we have any guidelines about how should it be done?
This would drop basic statistics compute completely and break API - I guess we would need to deprecate it first?
I would say that it is not breaking but more like fixing and making more consistent. As far as I understand, we have to use fit instead of compute in all estimators and compute was used here accidentally. But if we still need to use deprecation in such case then do we have any guidelines about how should it be done?
Agreed that it is fixing, but that doesn't mean its not breaking - anyone that is using our BasicStatistics API will be impacted by this change. @Alexsandruss how do we generally handle this? (ie should we add deprecation warning message but leave compute() and just call fit() from it initially or can we remove compute() immediately)
I would also expect that this may require an accompanying docs update, spmd interfaces, etc.
This would drop basic statistics compute completely and break API - I guess we would need to deprecate it first?
I would say that it is not breaking but more like fixing and making more consistent. As far as I understand, we have to use fit instead of compute in all estimators and compute was used here accidentally. But if we still need to use deprecation in such case then do we have any guidelines about how should it be done?
Agreed that it is fixing, but that doesn't mean its not breaking - anyone that is using our BasicStatistics API will be impacted by this change. @Alexsandruss how do we generally handle this? (ie should we add deprecation warning message but leave compute() and just call fit() from it initially or can we remove compute() immediately)
I would also expect that this may require an accompanying docs update, spmd interfaces, etc.
We can leave compute
as well with deprecation warning.
I'm wondering if BasicStatistics and IncrementalBasicStatistics shouldn't be two functions, possibly located in 'metrics' which returns a dict. @KulikovNikita @olegkkruglov @samir-nasibli
I think this is good idea to discuss. In case if we will accept this, deprecation warnings also needed here. As an option we can also add function for BS in metrics, just reusing original BS. Adding just function for IncrementalBasicStatistics idea will be harder.
Personally I am good with current solution as well.
I would also expect that this may require an accompanying docs update, spmd interfaces, etc.
@ethanglaser deprecation is added, spmd part is updated, but I can't see any docs containing information about old interfaces which should be updated according to these changes. Am I missing something?
/intelci: run
@olegkkruglov please rebase you branch
/intelci: run
/intelci: run
@olegkkruglov please rebase your branch and run internal CI as well. @icfaust , @ethanglaser @md-shafiul-alam @Alexsandruss Dear reviewers, expecting your approvals. If the comments are addressed please mark them as resolved.
@olegkkruglov by the way I see that there is not information about online algos in our docs, that should be addressed.
This is not the scope of the current PR, just wanted to highlight.
/intelci: run