Getting a value error when evaluating a model on test data
ValueError: Number of features of the input must be equal to or greater than that of the fitted transformer. Transformer n_features is 13 and input n_features is 9.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-24-7c7c1ba811cd> in <module>
28 y_test = strat_test_set["median_house_value"].copy()
29
---> 30 x_test_prepared = full_pipeline.transform(x_test)
31
32 final_predictions = final_model.predict(x_test_prepared)
~/anaconda3/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py in transform(self, X)
583 'transformer. Transformer n_features is {0} '
584 'and input n_features is {1}.'
--> 585 .format(self._n_features, X.shape[1]))
586
587 # No column reordering allowed for named cols combined with remainder
ValueError: Number of features of the input must be equal to or greater than that of the fitted transformer. Transformer n_features is 13 and input n_features is 9.
Hi, thanks for your feedback. It looks like the code you are running is slightly different from the one in the notebook. You are using lowercase names like x_test and x_test_prepared instead of X_test and X_test_prepared. I suspect that the model was trained using one dataset but then it is run on a different dataset, with a different list of columns. Please make sure you are running exactly the same code as in the notebook.
Here's a simple code example that reproduces the same exception:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
num_attribs = ["num1", "num2"]
cat_attribs = ["cat1"]
full_pipeline = ColumnTransformer([
("num", StandardScaler(), num_attribs),
("cat", OneHotEncoder(), cat_attribs),
])
X_train = pd.DataFrame({
"num1": [1., 2., 3.],
"num2": [4., 5., 6.],
"cat1": ["A", "B", "A"]
})
X_train_prepared = full_pipeline.fit_transform(X_train)
X_test = pd.DataFrame({
"num1": [1., 3.],
"cat1": ["B", "C"]
})
X_test_prepared = full_pipeline.transform(X_test)
Note that X_test is missing the num2 column.
Hope this helps.
X_train and X_test should have the same columns.
Once you applied the sklearn's Pipeline containing ColumnTransformation, then it is in model and when you evaluate your model on X_test, it will apply same pipeline as for train and the error will vanish.