PipelineDP icon indicating copy to clipboard operation
PipelineDP copied to clipboard

sklearn `Pipeline` support and examples

Open mjbommar opened this issue 3 years ago • 3 comments

Feature Description

DP is valuable in many "traditional" machine learning pipelines, and sklearn is the largest "traditional" ML ecosystem in Python. Would examples or first-class support for scikit-learn Pipeline workflows be worth contributing? We (@licensio) would be happy to contribute this via PR.

Is your feature request related to a problem?

The "framework-free" examples could easily be adapted to sklearn workflows, but substantially more concise usage would be possible with proper sklearn.Pipeline support.

What alternatives have you considered?

As discussed above, sklearn users could adapt the framework-agnostic examples.

Additional Context

N/A

mjbommar avatar Jan 28 '22 17:01 mjbommar

Thanks Michael for suggestion! It sounds interesting. We're open to add native support of different APIs (though having an example is a good start). We have on our roadmap to have better integration with the Python ecosystem.

Let's at first understand how it might look like. I'm not familiar with scikit-learn Pipeline workflows (I've just quickly checked its documentation). Could you please explain your ideas for an example of using PipelineDP and scikit-learn Pipeline?

dvadym avatar Jan 29 '22 17:01 dvadym

Let me work up a few options. I think there might be two distinct use cases - one for unsupervised workflows (e.g., clustering) and one for supervised workflows (e.g., regression).

In the meantime, here are a few more references that might be helpful if you are curious:

  • https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/feature_extraction/text.py#L557
  • https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/preprocessing/_function_transformer.py#L19
  • https://scikit-learn.org/stable/modules/preprocessing.html

mjbommar avatar Jan 29 '22 19:01 mjbommar

Hey Michael:

We've had a couple of internal teams think about this. Would you be open to a 30 min chat on this topic?

miguelagt avatar Feb 02 '22 00:02 miguelagt