Confirmed with @FBruzzesi on the direction of the PR. This discussion took take place in #620

Description

Supporting sample_weight for HierarchicalPredictor, HierarchicalRegressor, HierarchicalClassifier.

Fixes #620

Type of change

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

[x] My code follows the style guidelines (ruff)
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have made corresponding changes to the documentation (also to the readme.md)
[ ] I have added tests that prove my fix is effective or that my feature works
[ ] I have added tests to check whether the new feature adheres to the sklearn convention
[ ] New and existing unit tests pass locally with my changes

Mar 25 '25 20:03 AhmedThahir

@FBruzzesi, this is the best I could do, have not been able to understand how to proceed. If you could take some time to assist, it would be great.

=========================================================================== short test summary info ===========================================================================
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weights_not_an_array] - ValueError: DataFrame constructor not properly called!
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weight_equivalence_on_dense_data] - ZeroDivisionError: Weights sum to zero, can't be normalized
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weight_equivalence_on_sparse_data] - ValueError: Estimator does not work on sparse matrices
================================================================= 3 failed, 144 passed, 10 skipped in 11.49s ==================================================================

Mar 25 '25 21:03 AhmedThahir

Thanks for the contribution @AhmedThahir 🚀

@FBruzzesi, this is the best I could do, have not been able to understand how to proceed. If you could take some time to assist, it would be great.

I will take a look later today or tomorrow 😇

Mar 25 '25 21:03 FBruzzesi

Alright!

Will work on the:

suggested code changes
additional test cases

Will get back to you once done tonight/tomorrow.

Best Regards Ahmed Thahir LinkedIn https://www.linkedin.com/in/AhmedThahir | YouTube https://www.youtube.com/channel/UCZRDn0ZVbEYEHKVmMFBI6zQ

On Wed, 26 Mar 2025 at 2:23 AM Francesco Bruzzesi @.***> wrote:

@.**** commented on this pull request.

Hey @AhmedThahir https://github.com/AhmedThahir thanks again for the contribution, this is off a great start! I think we are already quite close!

I left a few suggestions in the code. Additionally to those fixes, could you add a few test cases?

In sklego/meta/hierarchical_predictor.py https://github.com/koaning/scikit-lego/pull/737#discussion_r2013012976:
   try:
       self.estimator_supports_sample_weight = "sample_weight" in inspect.signature(self.estimator.fit).parameters
   except Exception:
       self.estimator_supports_sample_weight = False
For a scikit-learn compatible estimator you cannot do any operation in init method. You can assign such attribute only during fit, and it must be (semi) private, with a trailing _

In sklego/meta/hierarchical_predictor.py https://github.com/koaning/scikit-lego/pull/737#discussion_r2013014762:

@@ -281,12 +289,24 @@ def fit(self, X, y=None): if self.n_features_in_ < 1: msg = "Found 0 features, while a minimum of 1 if required." raise ValueError(msg)
   self.has_sw_ = sample_weight is not None
   if self.has_sw_ and not self.estimator_supports_sample_weight:
       msg = f"Estimator does not support sample_weight."
       raise ValueError(msg)
   sample_weight = _check_sample_weight(sample_weight, X, None, ensure_non_negative=True)
Here you probably need to cast X to native since scikit-learn does not (yet) work with narwhals objects, also dtype argument has already None as default: ⬇️ Suggested change
   sample_weight = _check_sample_weight(sample_weight, X, None, ensure_non_negative=True)
   sample_weight = _check_sample_weight(sample_weight, X.to_native(), ensure_non_negative=True)
In sklego/meta/hierarchical_predictor.py https://github.com/koaning/scikit-lego/pull/737#discussion_r2013015988:
     _y = nw.to_native(grp_frame[self._TARGET_NAME])
   return clone(self.estimator).fit(_X, _y)
   args = [_X, _y]
   if self.estimator_supports_sample_weight and self.has_sw_:
and self.has_sw_ can probably be skipped - passing all ones should behave the same as not passing any sample weight (since the estimator supports them)

— Reply to this email directly, view it on GitHub https://github.com/koaning/scikit-lego/pull/737#pullrequestreview-2715331492, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWUEQREUVXLWMRPY3TUNVED2WHCMJAVCNFSM6AAAAABZYVAV5SVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDOMJVGMZTCNBZGI . You are receiving this because you were mentioned.Message ID: @.***>

Mar 26 '25 06:03 AhmedThahir

busy few days at my internship, will get back to you on this as soon as possible, hopefully by the end of the weekend.

Mar 27 '25 07:03 AhmedThahir

Still failing these tests when I run pytest tests/test_meta/test_hierarchical_predictor.py

sklego/meta/_grouped_utils.py:64: ValueError
=========================================================================== short test summary info ===========================================================================
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weights_not_an_array] - ValueError: DataFrame constructor not properly called!
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weight_equivalence_on_dense_data] - ZeroDivisionError: Weights sum to zero, can't be normalized
FAILED tests/test_meta/test_hierarchical_predictor.py::test_sklearn_compatible_estimator[HierarchicalRegressor(estimator=LinearRegression(),groups=0)-check_sample_weight_equivalence_on_sparse_data] - ValueError: Estimator does not work on sparse matrices

Mar 29 '25 07:03 AhmedThahir

Also, @FBruzzesi , don't the default sklearn sample weight tests (which I have failed above) take care of the tests? What other tests would be required in our case?

Mar 29 '25 07:03 AhmedThahir

Hey @AhmedThahir thanks for pushing the changes

Still failing these tests when I run pytest tests/test_meta/test_hierarchical_predictor.py

I am trying to debug the tests locally. From what I can see:

check_sample_weights_not_an_array (ValueError: DataFrame constructor not properly called!): raises in line 277 because X is not an array type. I think it's ok to raise since some "structure" in the input is required anyway for Hierarchical predictors, however I would raise a more informative error message.
check_sample_weight_equivalence_on_dense_data (ZeroDivisionError: Weights sum to zero, can't be normalized) - the issue is with the group column being a random continuous value from the automatic sklearn check, hence not really a group-like

One similar issue I can think of is, what should happen if one sample_weight is non-zero, but one group has all zeros in the array subset? I don't have an answer for that, but I think that would be ok to raise.

@koaning considering we are already skipping 9 sklearn compatible estimator check for Hierarchical, I would consider removing that entirely and come up with some ad-hoc tests. What do you think?

Mar 29 '25 09:03 FBruzzesi

Any action required from my side ?

Apr 01 '25 10:04 AhmedThahir

Any action required from my side ?

Hey @AhmedThahir, not really! If you are interested you can think about how we can test most of the hierchical functionalities and edge cases eventually

Apr 01 '25 11:04 FBruzzesi

@koaning considering we are already skipping 9 sklearn compatible estimator check for Hierarchical, I would consider removing that entirely and come up with some ad-hoc tests. What do you think?

Yeah, agree. Optimise for ease of maintainance here :)

Apr 14 '25 11:04 koaning

Any action required from my side ?

Hey @AhmedThahir, not really! If you are interested you can think about how we can test most of the hierchical functionalities and edge cases eventually

I'm not sure if I'm quite sure how to proceed here. the inbuilt check sample weights seems to take care of all cases that I can think of.

Apr 14 '25 14:04 AhmedThahir

I'm not sure if I'm quite sure how to proceed here. the inbuilt check sample weights seems to take care of all cases that I can think of.

Hey @AhmedThahir, what I meant is bigger scoped actually. For now you might:

Add the failing tests to the list of those to skip (check_sample_weights_not_an_array, check_sample_weight_equivalence_on_dense_data, check_sample_weight_equivalence_on_sparse_data)
Introduce new dedicated tests to check sample weight behave as expected in the dataframe/array cases

Apr 14 '25 19:04 FBruzzesi

feat(hierarchical): support sample_weight

Description

Type of change

Checklist:

I left a few suggestions in the code. Additionally to those fixes, could you add a few test cases?

For a scikit-learn compatible estimator you cannot do any operation in init method. You can assign such attribute only during fit, and it must be (semi) private, with a trailing _