fastFM icon indicating copy to clipboard operation
fastFM copied to clipboard

Train/Test of different column dimension

Open Hydrotoast opened this issue 8 years ago • 2 comments

Hi,

I was trying to train the factorization machine using a dataset with X_train and X_test where X_test.shape[1] < X_train.shape[1]. However, I could not proceed with training because of the following assertion:

assert X_test.shape[1] == len(self.w_)

Since self.w_ has a length initialized from X_train, X_test in this case will fail. It is perfectly reasonable to me that the number of columns in X_test could be less than or equal to the number of columns in X_train. The workaround for this is to zero-pad X_test on the right using the scipy.sparse.hstack function which should not be necessary.

Is there any motivation for why this assertion should continue to exist? If the shape of X_test is necessary for the fastFM-core, perhaps we could perform the test and zero-pad the matrix if necessary?

Hydrotoast avatar Jun 10 '16 23:06 Hydrotoast

It is perfectly reasonable to me that the number of columns in X_test could be less than or equal to the number of columns in X_train.

While it's reasonable it looks still like an edge case to me. Failing this assert is a strong indication of a bug in the feature engineering code.

I would suggest to at least issue a warning if the zero-padding is done automatically. Thoughts?

ibayer avatar Jun 17 '16 15:06 ibayer

I think that a warning is better than assert too.

Hydrotoast avatar Jun 17 '16 16:06 Hydrotoast