handson-ml2 Getting a value error when evaluating a model on test data

ValueError: Number of features of the input must be equal to or greater than that of the fitted transformer. Transformer n_features is 13 and input n_features is 9.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-24-7c7c1ba811cd> in <module>
     28 y_test = strat_test_set["median_house_value"].copy()
     29 
---> 30 x_test_prepared = full_pipeline.transform(x_test)
     31 
     32 final_predictions = final_model.predict(x_test_prepared)

~/anaconda3/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py in transform(self, X)
    583                              'transformer. Transformer n_features is {0} '
    584                              'and input n_features is {1}.'
--> 585                              .format(self._n_features, X.shape[1]))
    586 
    587         # No column reordering allowed for named cols combined with remainder

ValueError: Number of features of the input must be equal to or greater than that of the fitted transformer. Transformer n_features is 13 and input n_features is 9.

Jan 18 '21 02:01 SachinSarin

Hi, thanks for your feedback. It looks like the code you are running is slightly different from the one in the notebook. You are using lowercase names like x_test and x_test_prepared instead of X_test and X_test_prepared. I suspect that the model was trained using one dataset but then it is run on a different dataset, with a different list of columns. Please make sure you are running exactly the same code as in the notebook.

Here's a simple code example that reproduces the same exception:

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

num_attribs = ["num1", "num2"]
cat_attribs = ["cat1"]

full_pipeline = ColumnTransformer([
        ("num", StandardScaler(), num_attribs),
        ("cat", OneHotEncoder(), cat_attribs),
    ])

X_train = pd.DataFrame({
    "num1": [1., 2., 3.],
    "num2": [4., 5., 6.],
    "cat1": ["A", "B", "A"]
})

X_train_prepared = full_pipeline.fit_transform(X_train)

X_test = pd.DataFrame({
    "num1": [1., 3.],
    "cat1": ["B", "C"]
})

X_test_prepared = full_pipeline.transform(X_test)

Note that X_test is missing the num2 column.

Hope this helps.

Mar 01 '21 08:03 ageron

X_train and X_test should have the same columns. Once you applied the sklearn's Pipeline containing ColumnTransformation, then it is in model and when you evaluate your model on X_test, it will apply same pipeline as for train and the error will vanish.

Oct 04 '21 12:10 kabartay