Peter Hausamann comments

Results 25 comments of


Peter Hausamann

Merge various xarray scikit-learn wrappers

Hi Noah, I've looked into it and I think this would be a very valuable contribution, however we'd have to approach it at a higher level. Here's an example that...

Merge various xarray scikit-learn wrappers

> First, there are two approaches to wrapping an sklearn pipeline. The first is to wrap each individual transformer with xarray methods, which is what it seems you are trying...

Merge various xarray scikit-learn wrappers

[Here's an example from the docs](https://phausamann.github.io/sklearn-xarray/content/transformers.html#transformers-changing-the-number-of-samples). Basically, the `Sanitizer` removes samples from the dataset which would not work in a normal pipeline because X and y would have an inconsistent...

Merge various xarray scikit-learn wrappers

Yeah, totally! There is btw also the possibility to use wrapped estimators in a pipeline with plain numpy arrays, the wrapped estimator determines it's input time at `fit` time... so...

Merge various xarray scikit-learn wrappers

Thanks for the explanation, I see your point now. I think it would be very useful to have a mechanism to parallelize part of the pipeline on a per-variable basis....

Merge various xarray scikit-learn wrappers

I feel like it should be possible to combine the two ideas into one single estimator that both (a) applies a transformer or pipeline to each variable in the dataset...

Merge various xarray scikit-learn wrappers

> I definitely think the dictionary input idea is good, but I think it is better to provide it as a function sort of like how sklearn has make_union and...

Merge various xarray scikit-learn wrappers

Anyway, these things are just technicalities, I think you can start working on a PR and we'll continue the discussion there. I've put some information together in the [wiki](https://github.com/phausamann/sklearn-xarray/wiki).

Replicate entire sklearn module structure

On second thought, class decorators seem like a bad idea, mostly because the resulting object is not pickleable. It makes more sense for each estimator to subclass the corresponding wrapper,...

Replicate entire sklearn module structure

The benefit of this approach would be that each estimator could inherit the methods it needs from the corresponding mixin, e.g.: class PCA(_CommonEstimatorWrapper, _ImplementsTransformMixin, _ImplementsScoreMixin) In some cases, the class...