core Sonoff Zigbee 3.0 USB Dongle-P won't setup ZHA integration

trafficstars

The problem

Sonoff Dongle won't setup ZHA integration whatsoever. After I press "create a network" I get a "Loading next step for zha" windows followed by an "error" message without specifying what error is actually about.

What version of Home Assistant Core has the issue?

2024.8.3

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ZHA

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha/

Diagnostics information

I enabled ZHA logging and here is the part of the log after rebooting the system and trying to establish integration.

home-assistant_2024-08-27T19-35-25.304Z.log

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

Aug 27 '24 19:08 valiksheva

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: bd5c6e4. Link to the linter CI: here}

Mar 07 '24 00:03 github-actions[bot]

While LDA was a bit hard to port over to dask, PCA worked perfectly out of the box! (which I think is a pretty big win for the array API)

The other preprocessing/metrics/tools also worked out of the box, I think (at least judging by the tests).

Mar 07 '24 00:03 lithomas1

R.e. performance

Testing out of core

for dask generated by dask-ml with parameters

n_samples=100_000
n_classes=2
n_informative=5

on a Gitpod machine with 2 cores and 8 GB RAM, I get

14m51s for 100,000,000 by 20 LDA (14.90 GB) # 100,000,000 samples 47 seconds for 10,000,000 by 20 LDA (1.49 GB) # 10,000,000 samples 6m 3.5s for 50,000,000 by 20 LDA (7.45 GB). # 50,000,000 samples

which I think is a pretty decent scaling

Distributed computation

For

n_samples=20,000,000
n_classes=2
n_informative=5

and chunksize=100,000

I am measuring 45s runtime for 4 workers (2 CPU, 8GB RAM), and 4 min 29s runtime for a single worker

note: there was a bit of spilling on this one.

Mar 07 '24 00:03 lithomas1

cc @betatim @ogrisel

Mar 07 '24 07:03 adrinjalali

FYI, array-api-compat fixes are ongoing here https://github.com/data-apis/array-api-compat/pull/110

Mar 12 '24 01:03 lithomas1

Since array-api-compat 1.5.1 came out and CI is green here,

I'm going to be marking this PR as ready for review.

The only other change I'm planning right now is splitting out the LDA changes, since that requires a patch to dask itself.

The correct way to handle laziness is also something that might be good to think about. (It might be good to loop in more scikit-learn devs about this).

Mar 24 '24 21:03 lithomas1

Thanks @lithomas1! Could you please add an entry referencing this PR in doc/whats_new/v1.5.rst? There is a dedicated section for Array API changes. This should fix the "Check Changelog" CI failure.

Apr 02 '24 08:04 ogrisel

The only other change I'm planning right now is splitting out the LDA changes, since that requires a patch to dask itself.

Sounds good, however in the last CI run with dask I see 3 failures:

FAILED ::__init__.py::__init__.py::test_common.py::test_array_api_compliance[r2_score-check_array_api_regression_metric-dask.array-None-None] - ValueError: Boolean index assignment in Dask expects equally shaped arrays.
FAILED ::__init__.py::test_common.py::test_estimators[LinearDiscriminantAnalysis()-check_array_api_input(array_namespace=dask.array,dtype_name=None,device=None,skip_methods={})] - ValueError: An error occurred while calling the wrap_func_shape_as_first_ar...
FAILED ::__init__.py::__init__.py::test_multiclass.py::test_is_multilabel_array_api_compliance[dask.array-None-None] - AssertionError: is_multilabel(dask.array<array, shape=(10, 10), dtype=int64...

only one of them seems directly related to the lazy shape needed for LinearDiscriminantAnalysis.

The failures for check_array_api_regression_metric with r2_score and is_multilabel seem to be from a different nature but I would need to investigate with a debugger to confirm.

The correct way to handle laziness is also something that might be good to think about. (It might be good to loop in more scikit-learn devs about this).

I am personally fine with some kind of "trigger compute now if lazy" help in array-api-compat as a short term solution to be able to experiment with lazy Array API libs in scikit-learn and see empirically how much we have to rely on it in scikit-learn to inform the decision of a future version of the SPEC itself to better explicitly handle lazyness with an official API.

Apr 02 '24 08:04 ogrisel

Here is the info about the r2 score metric. So it's indeed a with of lazy shape:

> /Users/ogrisel/code/scikit-learn/sklearn/metrics/_regression.py(884)_assemble_r2_explained_variance()
-> output_scores[valid_score] = 1 - (
(Pdb) l
879             # (note: even if denominator is zero, thus avoiding NaN scores)
880             output_scores = xp.ones([n_outputs], device=device, dtype=dtype)
881             # Non-zero Numerator and Non-zero Denominator: use the formula
882             valid_score = nonzero_denominator & nonzero_numerator
883  
884  ->         output_scores[valid_score] = 1 - (
885                 numerator[valid_score] / denominator[valid_score]
886             )
887  
888             # Non-zero Numerator and Zero Denominator:
889             # arbitrary set to 0.0 to avoid -inf scores
(Pdb) p valid_score
dask.array<and_, shape=(2,), dtype=bool, chunksize=(2,), chunktype=numpy.ndarray>
(Pdb) p numerator
dask.array<sum-aggregate, shape=(2,), dtype=float64, chunksize=(2,), chunktype=numpy.ndarray>
(Pdb) p denominator
dask.array<sum-aggregate, shape=(2,), dtype=float64, chunksize=(2,), chunktype=numpy.ndarray>
(Pdb) p numerator[valid_score]
dask.array<getitem, shape=(nan,), dtype=float64, chunksize=(nan,), chunktype=numpy.ndarray>
(Pdb) p 1 - (numerator[valid_score] / denominator[valid_score])
dask.array<sub, shape=(nan,), dtype=float64, chunksize=(nan,), chunktype=numpy.ndarray>
(Pdb) p output_scores
dask.array<ones_like, shape=(2,), dtype=float64, chunksize=(2,), chunktype=numpy.ndarray>

This causes:

E               ValueError: Boolean index assignment in Dask expects equally shaped arrays.
E               Example: da1[da2] = da3 where da1.shape == (4,), da2.shape == (4,) and da3.shape == (4,).
E               Alternatively, you can use the extended API that supportsindexing with tuples.
E               Example: da1[(da2,)] = da3.

I wonder what should be done in this case. Trigger an explicit computation prior to assignment? In this particular case, dask might be able to perform some shape inference only from the lazy info to see that the masked boolean assignment has compatible shapes, even if they are all nans: they are compatible nans.

Apr 02 '24 10:04 ogrisel

For information, the tuple indexing syntax suggested in the error message would not work for this particular case:

diff --git a/sklearn/metrics/_regression.py b/sklearn/metrics/_regression.py
index ad5c76810f..f0931045ab 100644
--- a/sklearn/metrics/_regression.py
+++ b/sklearn/metrics/_regression.py
@@ -881,7 +881,7 @@ def _assemble_r2_explained_variance(
         # Non-zero Numerator and Non-zero Denominator: use the formula
         valid_score = nonzero_denominator & nonzero_numerator
 
-        output_scores[valid_score] = 1 - (
+        output_scores[(valid_score,)] = 1 - (
             numerator[valid_score] / denominator[valid_score]
         )

would lead to:

E               ValueError: Arrays chunk sizes are unknown: (nan,)
E               
E               A possible solution: https://docs.dask.org/en/latest/array-chunks.html#unknown-chunks
E               Summary: to compute chunks sizes, use
E               
E                  x.compute_chunk_sizes()  # for Dask Array `x`
E                  ddf.to_dask_array(lengths=True)  # for Dask DataFrame `ddf`

Apr 02 '24 10:04 ogrisel

The is_multilabel failure is indeed also related to nan-valued lazy shape items in the expression labels.shape[0] < 3 because labels's shape is value dependent: labels = xp.unique_values(y):

https://github.com/scikit-learn/scikit-learn/blob/0e19a4822ff49951d2a7606444a1a6085c32b56b/sklearn/utils/multiclass.py#L193-L198

Apr 02 '24 12:04 ogrisel

So to conclude we have identified only two types of problems related to the lazy evaluation in dask:

decisions in the code based on lazy shape item values;
masked assignments with the same lazy boolean array on each side.

Shall we create two issues upstream and link to this PR?

In the mean time we could either:

introduce a temporary, dask-specific eager evaluation private helper in scikit-learn and use it sparingly whenever we need value-dependent branching in our code;
introduce such a public helper in array-api-compat and use it in scikit-learn to unblock this PR.

Apr 02 '24 12:04 ogrisel

The only other change I'm planning right now is splitting out the LDA changes, since that requires a patch to dask itself.

Sounds good, however in the last CI run with dask I see 3 failures:

FAILED ::__init__.py::__init__.py::test_common.py::test_array_api_compliance[r2_score-check_array_api_regression_metric-dask.array-None-None] - ValueError: Boolean index assignment in Dask expects equally shaped arrays.
FAILED ::__init__.py::test_common.py::test_estimators[LinearDiscriminantAnalysis()-check_array_api_input(array_namespace=dask.array,dtype_name=None,device=None,skip_methods={})] - ValueError: An error occurred while calling the wrap_func_shape_as_first_ar...
FAILED ::__init__.py::__init__.py::test_multiclass.py::test_is_multilabel_array_api_compliance[dask.array-None-None] - AssertionError: is_multilabel(dask.array<array, shape=(10, 10), dtype=int64...

only one of them seems directly related to the lazy shape needed for LinearDiscriminantAnalysis.

Sorry, I pulled out all my changes last night related to the triggering compute on lazy shapes intentionally (and went to bed before leaving a comment ...), since I realized both of them depend on changing dask behavior/fixing bugs in dask.

For LDA, there's an annoying race condition, that if I was able to patch in dask, we would be able to use the shape helper, I mentioned above.

r2_score/is_multilabel has the issue where when dask is doing boolean assignment with arrays with a lazy shape, it doesn't automatically call compute on the operands, and errors out with a shape doesn't match error. So, we would have to insert a shape helper call randomly, which would go against the idea of being able to write code for both lazy and eager array implementations, and not having to worry about the laziness.

In this case, we might have to go for the heavier approach that @betatim mentioned of wrapping the dask array class itself, and call compute on lazy arrays in __setitem__ (and other methods if necessary) before handing the __setitem__ of to dask.

(I haven't worked out the details of this yet though, but my first thought is that we can return wrapped arrays from the array_namespace call)

IMO, right now, having these two things not work is not really a big deal for dask support in scikit-learn (esp. since the estimators/methods in progress right now, e.g. #27800 for RidgeCV, and #28106 for mean_tweedie_deviance look like they should work out of the box without any changes at an initial glance)

So, @betatim @ogrisel What do you think about just adding skips if necessary to any methods/estimators that dask doesn't support out of the box, and collecting data in a doc/Github issue somewhere, so we can have a more complete picture of what's needed to support laziness?

(I'm happy to transcribe our conversations on this topic (and planned solutions) so that we don't lose it.)

Apr 02 '24 13:04 lithomas1

I am fine with this if you find a concise way to add the necessary skips with links to one or more dedicated issues in either scikit-learn or upstream issue trackers.

Apr 02 '24 17:04 ogrisel

This should be ready for another pass now.

Primary mechanism for skips would be the _array_api_skips dictionary in sklearn/utils/_array_api.py

I also added a skip_dask fixture/decorator for individual tests that test for array_api_compliance.

Apr 04 '24 18:04 lithomas1

BTW, I launched the tests on hosts with cuda and MPS devices to confirm that this PR did not introduce any regression for those.

Apr 08 '24 12:04 ogrisel

Here is another pass of feedback, possibly final on my end.

Would it be possible to open issues upstream, either in the dask, array-api-compat repos or the the Array API spec repo to link to from the comments of the skipped tests?

Yep, I'm tracking failures in https://github.com/scikit-learn/scikit-learn/issues/26724#issuecomment-2038674794

Failures are listed as checkboxes (with accompanying upstream issues if necessary).

Thanks for the reviews. I've updated the PR following the feedback.

Apr 10 '24 01:04 lithomas1

Are you comfortable with merging partial dask support with skipped tests? Or do you prefer to implement a more complete support first?

Unsure. I started looked at the diff just now. I think if some estimators work and some don't that is one thing. But if for an estimator only some of it works, that is a bit weird. From a user's perspective it would be weird I think because you can't really do much if only part of an estimator works.

Apr 10 '24 07:04 betatim

Can we use array-api-compat to add slogdet to dask arrays? Then we'd avoid the problem that an estimator semi works.

For the two other skips (LinearDiscriminantAnalysis and r2_score), what is the error/exception a user will see when they try to use them with a dask array? Should we have a dedicated exception message for this case or is "random error", like you get when you use an estimator that doesn't yet have Array API support, good enough?

Apr 10 '24 08:04 betatim

Should we have a dedicated exception message for this case or is "random error", like you get when you use an estimator that doesn't yet have Array API support, good enough?

This is going to be hard (impossible) to do that in a library agnostic way. Note that we already get low-level error message with dask array in main:

>>> import sklearn
>>> sklearn.set_config(array_api_dispatch=True)
>>> import dask.array as da
>>> X = da.random.normal(size=(1000, 10))
>>> y = da.random.randint(0, 2, size=1000)
>>> from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
>>> LinearDiscriminantAnalysis().fit(X, y)
Traceback (most recent call last):
  File ~/miniforge3/envs/dev/lib/python3.11/site-packages/dask/backends.py:140 in wrapper
    return func(*args, **kwargs)
  File ~/miniforge3/envs/dev/lib/python3.11/site-packages/dask/array/wrap.py:63 in wrap_func_shape_as_first_arg
    parsed = _parse_wrap_args(func, args, kwargs, shape)
  File ~/miniforge3/envs/dev/lib/python3.11/site-packages/dask/array/wrap.py:33 in _parse_wrap_args
    chunks = normalize_chunks(chunks, shape, dtype=dtype)
  File ~/miniforge3/envs/dev/lib/python3.11/site-packages/dask/array/core.py:3095 in normalize_chunks
    chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks)
  File ~/miniforge3/envs/dev/lib/python3.11/site-packages/dask/array/core.py:3210 in auto_chunks
    raise ValueError(
ValueError: Can not perform automatic rechunking with unknown (nan) chunk sizes.

A possible solution: https://docs.dask.org/en/latest/array-chunks.html#unknown-chunks
Summary: to compute chunks sizes, use

   x.compute_chunk_sizes()  # for Dask Array `x`
   ddf.to_dask_array(lengths=True)  # for Dask DataFrame `ddf`

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  Cell In[11], line 1
    LinearDiscriminantAnalysis().fit(X, y)
  File ~/code/scikit-learn/sklearn/base.py:1474 in wrapper
    return fit_method(estimator, *args, **kwargs)
  File ~/code/scikit-learn/sklearn/discriminant_analysis.py:638 in fit
    self._solve_svd(X, y)
  File ~/code/scikit-learn/sklearn/discriminant_analysis.py:510 in _solve_svd
    self.means_ = _class_means(X, y)
  File ~/code/scikit-learn/sklearn/discriminant_analysis.py:116 in _class_means
    means = xp.zeros((classes.shape[0], X.shape[1]), device=device(X), dtype=X.dtype)
  File ~/miniforge3/envs/dev/lib/python3.11/site-packages/array_api_compat/_internal.py:28 in wrapped_f
    return f(*args, xp=xp, **kwargs)
  File ~/miniforge3/envs/dev/lib/python3.11/site-packages/array_api_compat/common/_aliases.py:134 in zeros
    return xp.zeros(shape, dtype=dtype, **kwargs)
  File ~/miniforge3/envs/dev/lib/python3.11/site-packages/dask/backends.py:142 in wrapper
    raise type(e)(
ValueError: An error occurred while calling the wrap_func_shape_as_first_arg method registered to the numpy backend.
Original Message: Can not perform automatic rechunking with unknown (nan) chunk sizes.

A possible solution: https://docs.dask.org/en/latest/array-chunks.html#unknown-chunks
Summary: to compute chunks sizes, use

   x.compute_chunk_sizes()  # for Dask Array `x`
   ddf.to_dask_array(lengths=True)  # for Dask DataFrame `ddf`

So from this point of view, this PR is not making things worse than they currently are. At least we document that dask support is not fully functional yet.

Apr 10 '24 14:04 ogrisel

Can we use array-api-compat to add slogdet to dask arrays? Then we'd avoid the problem that an estimator semi works.

This is not possible yet.

Dask doesn't expose a way to do a determinant yet. (and there doesn't seem to be an easy workaround, since dask's LU/QR decomps don't provide enough information to do it. I've opened an issue on the dask side, but there doesn't seem to be a response yet https://github.com/dask/dask/issues/11042)

I agree that it's not ideal to have dask not be able to use score, but this is something that I think is reasonable to have users work around for now, e.g. by using _estimator_with_converted_arrays, to convert the estimator from dask to numpy arrays. (similar to how we transfer arrays from GPU to CPU on cupy)

I'm not sure how important score is though. There seem to be some usages of it, but curiously there seem to be no examples in scikit-learn itself on using score.

https://github.com/search?q=repo%3Ascikit-learn%2Fscikit-learn+pca.score&type=code

Apr 21 '24 17:04 lithomas1

My feeling overall is that adding dask support leaves us with something that is only half finished for the (very few) existing estimators. My first reaction to that is to wait a bit until Array API support in dask has progressed a bit more. My assumption is that there is no need to be in a hurry here.

Apr 22 '24 12:04 betatim

I'm not sure how important score is though. There seem to be some usages of it, but curiously there seem to be no examples in scikit-learn itself on using score.

The score method is implicitly used by tools such as cross_val_score or GridSearchCV. However, it is true that it is very rarely used for PCA alone in practice. It's mostly used for supervised learning pipelines.

I agree that it's not ideal to have dask not be able to use score, but this is something that I think is reasonable to have users work around for now, e.g. by using _estimator_with_converted_arrays, to convert the estimator from dask to numpy arrays. (similar to how we transfer arrays from GPU to CPU on cupy)

It might also be a case the score method itself of the estimator can convert arrays to numpy if the namespace does not provide the necessary xp.linalg.slogdet method. For truncated PCA, we can expect the call of the fit method and maybe get_precision to be slow and would deserve running on accelerated namespace while the final xp.linalg.slogdet call should not be a performance critical operation (and the result is a scalar).

Apr 23 '24 15:04 ogrisel

Let's convert to draft for the time being then.

May 24 '24 14:05 ogrisel

Thanks for updating.

I'm back to working on this again now that my summer is over.

Last time I remember, I think had ridge regression working locally (after wrangling some issues with data-dependent output shapes which I think I worked around with where).

I'm going to try to fix some more issues in dask upstream like the lack of sorting, which seems to be a majority of failures, an also linalg.

r.e. dynamic shapes: Have you tried running sklearn functions through jax with JIT on? I believe dynamic shapes isn't an issue in jax without the JIT.

Aug 27 '24 20:08 lithomas1

Have you tried running sklearn functions through jax with JIT on?

No I have not tried.

Sep 11 '24 12:09 ogrisel

For the record, #30340 was recently merged to main so this might unblock this PR because we can now leverage dask compat features implemented in array-api-extra.

Mar 25 '25 15:03 ogrisel

Thanks, I'll rebase this this week and let you know how it goes.

Mar 25 '25 16:03 lithomas1

Made a bit of progress. Some notes I made along the way are:

TODOs

[ ] Upstream fill_diagonal to array-api-extra (5 uses in sklearn)
[ ] Upstream isin/in1d to array-api-extra (used in encoders)

Changes

had to disable shape check in _average for lazy backends

Blockers

[ ] setdiff1d in array-api-extra only works on eager arrays
- Probably the next major thing to work on (this blocks LabelEncoder from working)
  - remaining 13 metrics tests also depend on this

Progress

Fixed PCA (was broken due to fill_diagonal)
Fixed the input validators (e.g. is_multilabel)
- nunique in array-api-extra is very helpful so far
  - Able to do int(nunique(... to get n_classes
  - This forces compute (but shouldn't be so bad if we're only doing this on y)
- Going to try to apply this to LDA later on

Thoughts

nunique works, but it'd be better if it was part of a unique function though
- Right now, we have to do double the unique computation, once on finding the unique values and once for nunique
If we want to support lazy backends, there needs to be some thought as to how error checking should work
- Right now, I disable them since for dask we don't know the shape until we compute. The alternative would be to defer them to raise during compute time.

Mar 30 '25 22:03 lithomas1

I haven't touched this PR in a while (besides rebasing). I've opened an issue at array-api-extra, https://github.com/data-apis/array-api-extra/issues/268, for the missing functions and I think they're in favor of adding them.

Hopefully I can make the PRs into array-api-extra and/or finish debugging LDA next week.

Apr 17 '25 13:04 lithomas1

core core copied to clipboard

Sonoff Zigbee 3.0 USB Dongle-P won't setup ZHA integration

The problem

What version of Home Assistant Core has the issue?

What was the last working version of Home Assistant Core?

What type of installation are you running?

Integration causing the issue

Link to integration documentation on our website

Diagnostics information

Example YAML snippet

Anything in the logs that might be useful for us?

Additional information

✔️ Linting Passed

core
core copied to clipboard