prince
prince copied to clipboard
mca: "ValueError: dimension mismatch"
After trainning a scikit learn pipeline with mca, I try to use it in the test set, see code below, and get the error of "ValueError: dimension mismatch", (see the full log further down)
import prince
mca = prince.MCA(
n_components=20,
n_iter=3,
copy=True,
check_input=True,
engine="auto",
random_state=42,
)
enet = ElasticNet()
pipe_mca = Pipeline(
[("mca", mca), ("type", TypeSelector(np.number)), ("enet", enet)]
)
pipe_mca.fit(X_train[["Country", "FormalEducation"]],y_train);
Pipeline(pipe_mca.steps[:-1]).transform(X_train[["Country", "FormalEducation"]]).head()
print(
"MAE in train set for MCA: ",
mean_absolute_error(pipe_mca.predict(X_train[["Country", "FormalEducation"]]), y_train)
)
print(
"MAE in test set for MCA: ",
mean_absolute_error(pipe_mca.predict(X_test[["Country", "FormalEducation"]]), y_test)
)
I get the following error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-35-3f35a6c3613b> in <module>
1 print(
2 "MAE in test set for MCA: ",
----> 3 mean_absolute_error(pipe_mca.predict(X_test[["Country", "FormalEducation"]]), y_test)
4 )
/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/metaestimators.py in <lambda>(*args, **kwargs)
114
115 # lambda, but not partial, allows help() to work with update_wrapper
--> 116 out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
117 # update the docstring of the returned function
118 update_wrapper(out, self.fn)
/opt/anaconda3/lib/python3.7/site-packages/sklearn/pipeline.py in predict(self, X, **predict_params)
417 Xt = X
418 for _, name, transform in self._iter(with_final=False):
--> 419 Xt = transform.transform(Xt)
420 return self.steps[-1][-1].predict(Xt, **predict_params)
421
/opt/anaconda3/lib/python3.7/site-packages/prince/mca.py in transform(self, X)
48 if self.check_input:
49 utils.check_array(X, dtype=[str, np.number])
---> 50 return self.row_coordinates(X)
51
52 def plot_coordinates(self, X, ax=None, figsize=(6, 6), x_component=0, y_component=1,
/opt/anaconda3/lib/python3.7/site-packages/prince/mca.py in row_coordinates(self, X)
36 if not isinstance(X, pd.DataFrame):
37 X = pd.DataFrame(X)
---> 38 return super().row_coordinates(pd.get_dummies(X))
39
40 def column_coordinates(self, X):
/opt/anaconda3/lib/python3.7/site-packages/prince/ca.py in row_coordinates(self, X)
132
133 return pd.DataFrame(
--> 134 data=X @ sparse.diags(self.col_masses_.to_numpy() ** -0.5) @ self.V_.T,
135 index=row_names
136 )
/opt/anaconda3/lib/python3.7/site-packages/scipy/sparse/base.py in __rmatmul__(self, other)
568 raise ValueError("Scalar operands are not allowed, "
569 "use '*' instead")
--> 570 return self.__rmul__(other)
571
572 ####################
/opt/anaconda3/lib/python3.7/site-packages/scipy/sparse/base.py in __rmul__(self, other)
552 except AttributeError:
553 tr = np.asarray(other).transpose()
--> 554 return (self.transpose() * tr).transpose()
555
556 #####################################
/opt/anaconda3/lib/python3.7/site-packages/scipy/sparse/base.py in __mul__(self, other)
518
519 if other.shape[0] != self.shape[1]:
--> 520 raise ValueError('dimension mismatch')
521
522 result = self._mul_multivector(np.asarray(other))
ValueError: dimension mismatch
I'm getting the same error when transforming a test dataset. Have you found a workaround?
I'm getting the same error when transforming a test dataset. Have you found a workaround?
Not really. After some testing I realized that I could only predict on train set. That the current function did not allowed to generalize.
I try some more encoders that gave me better results in train (and in test I cant say buy I expect) [https://contrib.scikit-learn.org/category_encoders/]
See if this helps:
https://github.com/MaxHalford/prince/issues/107#issuecomment-768144230
Hello there π
I apologise for not answering earlier. I was not maintaining Prince anymore. However, I have just refactored the entire codebase. This refactoring should have fixed many bugs.
I donβt have time and energy to check if this fixes your issue, but there is a good chance it does. Feel free to reopen this issue if the problem persists after installing the new version β that is, version 0.8.0 and onwards.