featuretools icon indicating copy to clipboard operation
featuretools copied to clipboard

Generate stacked transform features within a single table if max_depth is greater than 1

Open thehomebrewnerd opened this issue 2 years ago • 2 comments

  • As a user, I wish I could use Featuretools to generate stacked transform features within a single table. Currently DFS will not generate any stacked transform features within a single table, regardless of the setting of the max_depth parameter. Allowing for stacking of features within a table when max_depth is greater than one could be useful in some situations.

One potential use case of this would be to generate features that capture interactions between transform features. For example if we consider the case of trying to determine whether a given datetime falls during lunch time on a weekend, this could be generated by performing a boolean multiply on the features generated by the primitives IsLunchTime and IsWeekend. Currently only the boolean features from IsLunchTime and IsWeekend will be generated by DFS, but the stacked boolean multiplication feature will not be generated automatically.

Users can manually define these types of stacked transform features, but it could be beneficial for DFS to handle this.

thehomebrewnerd avatar Jul 06 '22 16:07 thehomebrewnerd

One thing of note is that you can apply primitives in different orders and get different sets of primitives, and I think that can be really apparent with transform stacking. For example: If you had no boolean columns but included the primitives IsNull and And, you would not get any use of And if it gets applied before IsNull. And I think we made changes at some point to sort the inputted primitives so that if users put in the same primitives but in different orders, you don't get out different sets of primitives.

Another thing to worry about with transform stacking is that you can infinitely stack transform primitives, and that can lead you down a really long hole of not useful primitives getting stacked upon each other. But I think that kind of exists with agg primitives too, so probably not a huge issue.

tamargrey avatar Jul 06 '22 16:07 tamargrey

My thought is that this would work in "passes", with the number of passes being equal to the max depth setting. The first pass would generate the features that we currently generate. Then if max_depth is set to more than 1, we would make another pass through and generate additional features based on the features generated from the first pass, continuing on with this process until max_depth is reached at which time we would stop.

I think the number of features generated could be a real concern here though, and we would need to think through what this looks like in a multi-table setup. I'm also sure I'm over-simplifying this in my mind.

thehomebrewnerd avatar Jul 06 '22 16:07 thehomebrewnerd