evalml
evalml copied to clipboard
Add helper method to combine pipelines
Separating out work from https://github.com/alteryx/evalml/issues/2058, https://github.com/alteryx/evalml/pull/2968 tackled the first half of creating a preprocessing pipeline that will encompass all of the components created from data check actions.
This issue will tackle the second half: adding a utility method to combine a preprocessing pipeline with another pipeline.
I don't think this should be iced too much!
If I'm correct, @jeremyliweishih is working on a way to combine separate pipelines to put categorical feature selection "first" in the default algorithm.
In the design of #2511, I'm proposing we similarly combine two different pipelines into one pipeline.
@christopherbunn Already wrote a util method to combine a list of pipelines into one pipeline but tailoring the implementation specifically for stacked ensembles.
Seems like we need to have general way to combine component graphs! Let's do it! Otherwise we'll have four different util methods doing basically the same thing.
@freddyaboulton since I just saw your comment: just chatted with @chukarsten, agreed that perhaps we should prioritize this higher, since this has shown up in multiple different different issues.
@chukarsten @dsherry Can we prioritize this for the upcoming sprint?
@angela97lin @freddyaboulton @dsherry I'm taking the liberty and putting this one in our current sprint!
Marking this as tech debt because there are two related issues that stem from not having an api for combining pipelines:
- #3076
- #2987
I think we'll continue to have issues with our combined pipelines if we have different util methods for each case when a combined pipeline is desired.
For context, we currently have two ways of combining pipelines:
This is also impacting time series with known-in-advance features, see PR #3149
@angela97lin what's the status of this?
There's no current work being done on this, as it wasn't necessary for the data check actions work. Still a valid tech debt issue but until we have plans to combine the preprocessing pipeline for applying actions with that generated with AutoMLSearch, this isn't relevant to data check actions specifically.