evalml icon indicating copy to clipboard operation
evalml copied to clipboard

Add target name to pipeline's `input_feature_names`

Open angela97lin opened this issue 4 years ago • 3 comments

Note: migrated from #1493 which tracks simply adding target names to the output of predict; this issue tracks a suggestion also made in the thread.

Based on the discussion raised by @gsheni and @kmax12 in Slack we should update the .input_feature_names value held our pipelines to hold both the feature column names, as well as the target name.

Currently, .input_feature_names returns a dictionary, where the keys are the components and the corresponding values are the feature names that the component gets. We could update this to be a tuple, where the first element of the tuple is the current list of feature names, and the second element of the tuple is the name of the target.

That is, currently we have something like:

{"OHE": ["col1", "col2", "col3"], "Imputer: ["col1_1", "col1_2", "col2", "col3"]}

We could update this to:

{"OHE": (["col1", "col2", "col3"], "target"), "Imputer: (["col1_1", "col1_2", "col2", "col3"], "target")}

I don't see the target name changing, so maybe this is a bit silly, but it also keeps the nice structure we have now where we keep track of what every component sees.

angela97lin avatar Dec 17 '20 17:12 angela97lin

@angela97lin @dsherry : Didn't you merge a PR that addressed this already? I swear I reviewed it.

chukarsten avatar Jan 15 '21 18:01 chukarsten

@chukarsten Ah, similar! I merged in https://github.com/alteryx/evalml/pull/1578 which keeps track of the target name, but it doesn't update input_feature_names to do so. I guess this issue would track adding / consolidating this to the input_feature_names attribute, if we still think that's useful.

angela97lin avatar Jan 15 '21 19:01 angela97lin

Once #1757 is done, this issue tracks also adding target name to estimators. (Accessible from pandas Series as name attr)

dsherry avatar Feb 04 '21 19:02 dsherry