evalml icon indicating copy to clipboard operation
evalml copied to clipboard

API to view the features provided as input to each component in a pipeline

Open kmax12 opened this issue 5 years ago • 5 comments

Is this the best way to do it?

pipeline = clf.best_pipeline
pipeline.input_feature_names[pipeline.estimator.name]

We might want to think through improving this API or at least documenting this is how to do it

kmax12 avatar Nov 20 '19 22:11 kmax12

@kmax12 , could you explain what the goal is here? I think I don't have the right context yet to understand the motivation. Why does the estimator need the feature names?

And did you mean for input_feature_names to be a function here? It appears to be some sort of getter method, in which case it's just going to return a string; I thought you were looking for a setter.

dsherry avatar Nov 22 '19 16:11 dsherry

@dsherry In our pipeline class (PipelineBase), we store the names of the input features to each step of the pipeline in a dictionary called self.input_feature_names when we call fit(), so pipeline.input_feature_names[pipeline.estimator.name] will actually return a pd.DataFrame object.

I think the idea of storing the feature names passed to the estimator is that it allows us to know how many features / which features were used in training the model.

angela97lin avatar Nov 22 '19 16:11 angela97lin

@angela97lin thanks, that makes sense.

Notes from discussing this with Max just now:

This originally came from user feedback. The user wanted to see the list of features which were provided to the estimator's fit method. This could be a subset of the full list of features given to the pipeline if the pipeline has a feature selection component.

This ticket tracks the following decision: do we keep things as-is RE input features, or do we update the API for accessing each component's input features to make it easier or more clear?

dsherry avatar Dec 09 '19 17:12 dsherry

This issue has sat around for a while. Let's have it track designing APIs which do the following:

  • Allow users to access the feature names passed as inputs to each component after a pipeline has been trained
  • Allow users to access the feature values passed as inputs during a pipeline evaluation. This could be done simply by supporting slicing component graphs and then evaluating the sliced fragment on some data and returning the output.

dsherry avatar May 08 '20 22:05 dsherry

Our component graph now supports two things which help here

  1. Compute features provided to estimator
  2. View the output of each component in the graph

We don't actually have point 2 exposed in an API.

Let's let this issue now track exposing a way to access each component's output from a component graph evaluation.

dsherry avatar Jun 18 '21 19:06 dsherry