evalml icon indicating copy to clipboard operation
evalml copied to clipboard

Add helper method to combine pipelines

Open angela97lin opened this issue 3 years ago • 6 comments

Separating out work from https://github.com/alteryx/evalml/issues/2058, https://github.com/alteryx/evalml/pull/2968 tackled the first half of creating a preprocessing pipeline that will encompass all of the components created from data check actions.

This issue will tackle the second half: adding a utility method to combine a preprocessing pipeline with another pipeline.

angela97lin avatar Oct 31 '21 04:10 angela97lin

I don't think this should be iced too much!

If I'm correct, @jeremyliweishih is working on a way to combine separate pipelines to put categorical feature selection "first" in the default algorithm.

In the design of #2511, I'm proposing we similarly combine two different pipelines into one pipeline.

@christopherbunn Already wrote a util method to combine a list of pipelines into one pipeline but tailoring the implementation specifically for stacked ensembles.

Seems like we need to have general way to combine component graphs! Let's do it! Otherwise we'll have four different util methods doing basically the same thing.

freddyaboulton avatar Nov 01 '21 14:11 freddyaboulton

@freddyaboulton since I just saw your comment: just chatted with @chukarsten, agreed that perhaps we should prioritize this higher, since this has shown up in multiple different different issues.

@chukarsten @dsherry Can we prioritize this for the upcoming sprint?

angela97lin avatar Nov 03 '21 17:11 angela97lin

@angela97lin @freddyaboulton @dsherry I'm taking the liberty and putting this one in our current sprint!

chukarsten avatar Nov 09 '21 21:11 chukarsten

Marking this as tech debt because there are two related issues that stem from not having an api for combining pipelines:

  • #3076
  • #2987

I think we'll continue to have issues with our combined pipelines if we have different util methods for each case when a combined pipeline is desired.

For context, we currently have two ways of combining pipelines:

This is also impacting time series with known-in-advance features, see PR #3149

freddyaboulton avatar Dec 13 '21 17:12 freddyaboulton

@angela97lin what's the status of this?

dsherry avatar Jan 31 '22 20:01 dsherry

There's no current work being done on this, as it wasn't necessary for the data check actions work. Still a valid tech debt issue but until we have plans to combine the preprocessing pipeline for applying actions with that generated with AutoMLSearch, this isn't relevant to data check actions specifically.

angela97lin avatar Mar 01 '22 17:03 angela97lin