NimbusML User defined transforms drop features if not explicitly specified

User defined transforms drop features if not explicitly specified

Open pieths opened this issue 5 years ago • 0 comments

Using a transform, which only acts on a subset of the input columns, before a predictor and not explicitly specifying the features to the predictor will only pass the output columns of the transform to the predictor and not all the input columns.

In the following code, a DataFrame is created with two columns c1 and c2. The c2 column is used as the label and the feature argument is not specified to OGDRegressor. When fit is executed only the output of RangeFilter gets sent as features to OGDRegressor. The c1 column is not included as a feature for the regressor.

Is this the expected behavior or, since the features are not explicitly specified, should all the columns be passed through to the regressor? The latter is the behavior when no transform is put before a predictor.

import numpy as np
import pandas as pd
from nimbusml import Pipeline
from nimbusml.linear_model import OnlineGradientDescentRegressor
from nimbusml.preprocessing.filter import RangeFilter

train_data = {'c1': [1, 2, 3, 4], 'c2': [2, 3, 4, 5]}
train_df = pd.DataFrame(train_data).astype(np.float32)

pipeline = Pipeline([RangeFilter(min=0.0, max=4.5) << 'c2',
                     OnlineGradientDescentRegressor(label='c2')])
pipeline.fit(train_df)

Here is the feature combiner node that gets passed to ML.Net,

{
    "Inputs": {
        "Data": "$label_data",
        "Features": [
            "c2"
        ]
    },
    "Name": "Transforms.FeatureCombiner",
    "Outputs": {
        "Model": "$output_model4",
        "OutputData": "$output_data"
    }
},

Feb 14 '20 21:02 pieths

NimbusML NimbusML copied to clipboard

User defined transforms drop features if not explicitly specified

NimbusML
NimbusML copied to clipboard