fastFM
fastFM copied to clipboard
Train/Test of different column dimension
Hi,
I was trying to train the factorization machine using a dataset with X_train
and X_test
where X_test.shape[1] < X_train.shape[1]
. However, I could not proceed with training because of the following assertion:
assert X_test.shape[1] == len(self.w_)
Since self.w_
has a length initialized from X_train
, X_test
in this case will fail. It is perfectly reasonable to me that the number of columns in X_test
could be less than or equal to the number of columns in X_train
. The workaround for this is to zero-pad X_test
on the right using the scipy.sparse.hstack
function which should not be necessary.
Is there any motivation for why this assertion should continue to exist? If the shape of X_test
is necessary for the fastFM-core, perhaps we could perform the test and zero-pad the matrix if necessary?
It is perfectly reasonable to me that the number of columns in X_test could be less than or equal to the number of columns in X_train.
While it's reasonable it looks still like an edge case to me. Failing this assert is a strong indication of a bug in the feature engineering code.
I would suggest to at least issue a warning if the zero-padding is done automatically. Thoughts?
I think that a warning is better than assert too.