evalml icon indicating copy to clipboard operation
evalml copied to clipboard

Coordinate equivalent output from `component_dict` across list and dict inputs

Open ParthivNaresh opened this issue 3 years ago • 0 comments

Currently if a list is passed as a component_graph to a pipeline, the component_dict outputs the class object as the first item in the value of the list within _make_component_dict_from_component_list

class LinearRegressionPipeline(RegressionPipeline):
        """Linear Regression Pipeline for regression problems."""

        component_graph = [
            "One Hot Encoder",
            "Imputer",
            "Standard Scaler",
            "Linear Regressor",
        ]
        custom_name = "Linear Regression Pipeline"

        def __init__(self, parameters, random_seed=0):
            super().__init__(
                self.component_graph,
                parameters=parameters,
                custom_name=self.custom_name,
                random_seed=random_seed,
            )

pipeline_ = LinearRegressionPipeline({})
print(pipeline_.component_graph.component_dict)
----------------------------------------------------------
{'Imputer': [<class 'evalml.pipelines.components.transformers.imputers.imputer.Imputer'>,
             'One Hot Encoder.x',
             'y'],
 'Linear Regressor': [<class 'evalml.pipelines.components.estimators.regressors.linear_regressor.LinearRegressor'>,
                      'Standard Scaler.x',
                      'y'],
 'One Hot Encoder': [<class 'evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoder'>,
                     'X',
                     'y'],
 'Standard Scaler': [<class 'evalml.pipelines.components.transformers.scalers.standard_scaler.StandardScaler'>,
                     'Imputer.x',
                     'y']}

If a dictionary is passed, then the name of the component is passed instead since the component_graph is passed directly to the ComponentGraph class

component_graph = {
        "Imputer": ["Imputer", "X", "y"],
        "Target Imputer": ["Target Imputer", "X", "y"],
        "OneHot_RandomForest": ["One Hot Encoder", "Imputer.x", "Target Imputer.y"],
        "OneHot_ElasticNet": ["One Hot Encoder", "Imputer.x", "y"],
        "Random Forest": ["Random Forest Classifier", "OneHot_RandomForest.x", "y"],
        "Elastic Net": ["Elastic Net Classifier", "OneHot_ElasticNet.x", "y"],
        "Logistic Regression": [
            "Logistic Regression Classifier",
            "Random Forest.x",
            "Elastic Net.x",
            "y",
        ],
    }
----------------------------------------------------------
{'Elastic Net': ['Elastic Net Classifier', 'OneHot_ElasticNet.x', 'y'],
 'Imputer': ['Imputer', 'X', 'y'],
 'Logistic Regression': ['Logistic Regression Classifier',
                         'Random Forest.x',
                         'Elastic Net.x',
                         'y'],
 'OneHot_ElasticNet': ['One Hot Encoder', 'Imputer.x', 'y'],
 'OneHot_RandomForest': ['One Hot Encoder', 'Imputer.x', 'Target Imputer.y'],
 'Random Forest': ['Random Forest Classifier', 'OneHot_RandomForest.x', 'y'],
 'Target Imputer': ['Target Imputer', 'X', 'y']}

Since component_dict is easily accessible for OS users, this should be standardized.

ParthivNaresh avatar Sep 23 '21 19:09 ParthivNaresh