core Envisalink integration + alarm-panel card displays keypad unnecessarily

The problem

Envisalink integration + alarm-panel card displays keypad unnecessarily.

Expected behavior (core-2024.5.5 and earlier):

alarm-panel card does not display a keypad if envisalink.code is specified in configuration.yaml

Actual behavior (as of core-2024.6.0 to core-2024.6.2):

alarm-panel card does display a keypad if envisalink.code is specified in configuration.yaml

dashboard

Related to #118668

What version of Home Assistant Core has the issue?

core-2024.6.2

What was the last working version of Home Assistant Core?

core-2024.5.5

What type of installation are you running?

Home Assistant Container

Integration causing the issue

Envisalink

Link to integration documentation on our website

https://www.home-assistant.io/integrations/envisalink/

Diagnostics information

No response

Example YAML snippet

envisalink:
  host: !secret envisalink_host
  user_name: !secret envisalink_username
  password: !secret envisalink_password
  code: !secret envisalink_code
  evl_version: 4
  panel_type: DSC

Anything in the logs that might be useful for us?

No response

Additional information

Dashboard YAML:

type: alarm-panel
states:
  - arm_home
  - arm_away
entity: alarm_control_panel.home_alarm
name: Alarm

Jun 12 '24 01:06 andornaut

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 568f37a. Link to the linter CI: here}

Apr 26 '24 15:04 github-actions[bot]

So for simple cases where metadata is only used in fit and transform expects no metadata, we're fine. But things get a bit trickier when we start having a transform method of a step accepting the same metadata as the fit of that step.

Specifically, in this test:

@pytest.mark.usefixtures("enable_slep006")
@pytest.mark.parametrize("method", ["fit", "fit_transform"])
def test_transform_input_pipeline(method):
    """Test that with transform_input, data is correctly transformed for each step."""

    def get_transformer(registry, sample_weight, metadata):
        """Get a transformer with requests set."""
        return (
            ConsumingTransformer(registry=registry)
            .set_fit_request(sample_weight=sample_weight, metadata=metadata)
            .set_transform_request(sample_weight=sample_weight, metadata=metadata)
        )

    def get_pipeline():
        """Get a pipeline and corresponding registries.

        The pipeline has 4 steps, with different request values set to test different
        cases. One is aliased.
        """
        registry_1, registry_2, registry_3, registry_4 = (
            _Registry(),
            _Registry(),
            _Registry(),
            _Registry(),
        )
        pipe = make_pipeline(
            get_transformer(registry_1, sample_weight=True, metadata=True),
            get_transformer(registry_2, sample_weight=False, metadata=False),
            get_transformer(registry_3, sample_weight=True, metadata=True),
            get_transformer(registry_4, sample_weight="other_weights", metadata=True),
            transform_input=["sample_weight"],
        )
        return pipe, registry_1, registry_2, registry_3, registry_4

    def check_metadata(registry, methods, **metadata):
        """Check that the right metadata was recorded for the given methods."""
        assert registry
        for estimator in registry:
            for method in methods:
                check_recorded_metadata(
                    estimator,
                    method=method,
                    **metadata,
                )

    X = np.array([[1, 2], [3, 4]])
    y = np.array([0, 1])
    sample_weight = np.array([[1, 2]])
    other_weights = np.array([[30, 40]])
    metadata = np.array([[100, 200]])

    pipe, registry_1, registry_2, registry_3, registry_4 = get_pipeline()
    pipe.fit(
        X,
        y,
        sample_weight=sample_weight,
        other_weights=other_weights,
        metadata=metadata,
    )

    check_metadata(
        registry_1, ["fit", "transform"], sample_weight=sample_weight, metadata=metadata
    )
    check_metadata(registry_2, ["fit", "transform"])
    check_metadata(
        registry_3,
        ["fit", "transform"],
        sample_weight=sample_weight + 2,
        metadata=metadata,
    )
    check_metadata(
        registry_4,
        method.split("_"),  # ["fit", "transform"] if "fit_transform", ["fit"] otherwise
        sample_weight=other_weights + 3,
        metadata=metadata,
    )

Step 3 receives transformed data in its transform method during fit of the pipeline cause all metadata are transformed if they're in transform_input, but a second time when step3.transform is called, the metadata is not transformed (cause I haven't implemented it in pipeline.transform yet).

The question is, what should be the expected behavior?

Do we want transform_input to only transform when calling fit of sub estimators? That's a bit tricky cause all TransformerMixin estimators implement a fit_transform which accepts all metadata together, which means the metadata (if the same name) is either transformed or not transformed. (Wish we didn't have fit_transform in the first place, it's giving us so much headache)

May 08 '24 09:05 adrinjalali

Actually, in TransformerMixin we have:

        if _routing_enabled():
            transform_params = self.get_metadata_routing().consumes(
                method="transform", params=fit_params.keys()
            )
            if transform_params:
                warnings.warn(
                    (
                        f"This object ({self.__class__.__name__}) has a `transform`"
                        " method which consumes metadata, but `fit_transform` does not"
                        " forward metadata to `transform`. Please implement a custom"
                        " `fit_transform` method to forward metadata to `transform` as"
                        " well. Alternatively, you can explicitly do"
                        " `set_transform_request`and set all values to `False` to"
                        " disable metadata routed to `transform`, if that's an option."
                    ),
                    UserWarning,
                )

and we never send anything to .transform. So in Pipeline we can also assume things are only transformed for fit, as long as scikit-learn is concerned.

However, for third party transformers where they can have their own fit_transform and route parameters, then things can become tricky, as the example in the previous comment shows.

May 13 '24 08:05 adrinjalali

Another question is, do we want to have this syntactic sugar?

pipe = make_pipeline(
    StandardScaler(),
    HistGradientBoostingClassifier(..., early_stopping=True)
).fit(X, y, X_val, y_val)

The above code would:

early_stopping=True would change the default request values so that the user doesn't have to type .set_fit_request(X_val=True, y_val=True)
early_stopping=True sets something in the instance of the estimator which tells pipeline that X_val is of the same nature as X, and therefore should be transformed

It wouldn't change what we have now implemented in Pipeline in this PR, but would make it easier for the user. Not sure if it's too magical for us though.

For that to happen, HGBC need to have:

class HistGradientBoostingClassifier(...):
    ...

    def get_metadata_routing(self):
        routing = super().get_metadata_routing()
        if self.early_stopping:
            routing.fit.add(X_val=True, y_val=True)

    def __sklearn_get_transforming_data__(self):
        return ["X_val"]

And Pipeline would look for info in __sklearn_get_transforming_data__ if it exists.

cc @glemaitre

It goes towards the direction of having more default routing info as @ogrisel really likes. (ref #26179 )

Note that this could come later separately as an enhancement to this PR.

May 14 '24 09:05 adrinjalali

There is an issue with testing metadata routing in more complex situations (which has come up in this PR) which requires some fixes (adding parent or caller to how metadata is recorded in testing estimators) which is included in this PR now.

May 26 '24 17:05 adrinjalali

I checked lightgbm and xgboost, they both take a list of tuples as validation sets, which makes me think why, and if this proposal is enough!

Sep 02 '24 13:09 adrinjalali

I checked lightgbm and xgboost, they both take a list of tuples as validation sets, which makes me think why, and if this proposal is enough!

Let's ask them directly: @jameslamb @StrikerRUS @shiyu1994 @trivialfis @hcho3 your opinion would be very appreciated. We are trying to transform metadata on the way to the step of a pipeline where it is needed, e.g. validation data for early stopping in GBTs, see https://github.com/scikit-learn/scikit-learn/pull/28901#discussion_r1740806300 (StandardScaler is just for demonstration purposes).

Sep 02 '24 18:09 lorentzenchr

Thank you for the ping. There's no particular reason for XGBoost; it was developed in the early days before the sklearn estimator guideline existed. If there's a new best practice, please share it, and we will be happy to make the change to comply with the new estimator guideline.

On the other hand, we sometimes use multiple evaluation datasets. For example, we might monitor the progress for both training and validation instead of only validation. Sometimes, tracking the training accuracy might not be necessary, and computation performance can be improved by evaluating only the validation dataset. There are other cases for using custom evaluation datasets as well. Leaving it as a user choice makes sense. Therefore, the option of using a variable sequence of datasets remains unchanged.

Sep 03 '24 08:09 trivialfis

If the API changes to est.fit(X, y, X_val=(X1, X2), y_val=(y1, y2)), then we could have a special case for tuple in our mechanism to transform all in the X_val tuple, and the user code would look like:

X, y = make_regression(n_samples=200, n_features=500, n_informative=5, random_state=0)
X[:2,] = X[:2,] + 20

# Validation set chosen before looking at the data.
X_val, y_val = X[:50,], y[:50,]
X, y = X[50:,], y[50:,]

est_gs = GridSearchCV(
    Pipeline(
        (
            StandardScaler(),
            XGBRegressor(
                early_stopping_rounds=3,
            ).set_fit_request(X_val=True, y_val=True),
        ),
        # telling pipeline to transform these inputs up to the step which is
        # requesting them.
        transform_input=["X_val", "y_val"],
    ),
    param_grid={"histgradientboostingregressor__max_depth": list(range(5))},
    cv=5,
).fit(X, y, X_val=(X, X_val), y_val=(y, y_val))
# this passes X_val, y_val to Pipeline, and Pipeline knows how to deal with
# them.

Sep 03 '24 08:09 adrinjalali

Thanks a lot for the invitation to the discussion!

If there's a new best practice, please share it, and we will be happy to make the change to comply with the new estimator guideline.

The same for LightGBM. The discussing here interface was developed very long time ago and I believe it was just a replication of the existing at that moment XGBoost's one to not overcomplicate users' experience.

On the other hand, we sometimes use multiple evaluation datasets.

Indeed true! This feature of multiple validation sets should be preserved, I think. Linking some related discussions:

https://github.com/microsoft/LightGBM/issues/4981

I implemented multiclass logloss as a custom loss function, and trained while evaluating on 3 validation sets: the training data, the training data in shuffled order, and a heldout set.
https://github.com/microsoft/LightGBM/issues/6360

then we could have a special case for tuple in our mechanism to transform all in the X_val tuple

Will it play nice with some other data fit() arguments like sample_weight/eval_sample_weight and init_score/eval_init_score?

Sep 05 '24 23:09 StrikerRUS

ping @jeremiedbb for backport in the branch

Nov 15 '24 14:11 glemaitre

core core copied to clipboard

Envisalink integration + alarm-panel card displays keypad unnecessarily

The problem

What version of Home Assistant Core has the issue?

What was the last working version of Home Assistant Core?

What type of installation are you running?

Integration causing the issue

Link to integration documentation on our website

Diagnostics information

Example YAML snippet

Anything in the logs that might be useful for us?

Additional information

✔️ Linting Passed

core
core copied to clipboard