fastFM
fastFM copied to clipboard
can "OneVsRestClassifier" be usefull to tune FMclassification into a multi-class classification case
As fastFM-core only accept {1,-1} as class labels and there is no predict_proba or decision_function I was wondering if it can fully satisfy the expectations of OneVsRestClassifierI? If so is there any tip or solution so that I can apply sgd-FMclassification in multi-class problem?
Actually there is a predict_proba
function for the als and sdg solver but it's not showing up in the docs (there is an open issue now). In general, I would recommend to use the mcmc
solver with the fit_predict_proba
function.
#47
thank you so much,I was thinking to try implementing a customized predic_proba
and 'fit' functions for sgd which can solve the problem but you said that there is an issue!
Although I preferred to use sgd method but I also tried it withmcmc
solver as following:
y_proba =OneVsRestClassifier(fm).fit_predict_proba(X_train, y_train, X_test)
but I face the following error:
AttributeError: 'OneVsRestClassifier' object has no attribute 'fit_predict_proba'
You are talking about the OneVsRestClassifierI
class from sklearn? I thought you want to implement it yourself. Please provide a Short, Self Contained, Correct Example to help us understand the issue.
yes, I am facing a multi-class classification problem and I was thinking to use OneVsRestClassifier
from sklearn for tuning sgd-FMclassification to solve my multi-class classification problem using FastFM method .
fm = sgd.FMClassification(n_iter=1000, init_stdev=0.1, rank=2, random_state=123,l2_reg_w=0, l2_reg_V=0, l2_reg=0, step_size=0.1)
OneVsRestClassifier(fm).fit(X_train,y_train)
with the above code I face the following error related to sklearn.base
:
RuntimeError: Cannot clone object FMClassification(init_stdev=0.1, l2_reg=None, l2_reg_V=0, l2_reg_w=0, n_iter=1000, random_state=123, rank=2, step_size=0.1), as the constructor does not seem to set parameter l2_reg_V
I thought maybe it is because of the fact that sgd (or als) solver has not a proper decision_function for OneVsRestClassifier
!?
That's why I asked this question and I was wondering if implementing customized **fit _and _predict_proba by myself can be a plausible solution for this problem?
The Cannot clone object FMClassification error has been reported before https://github.com/ibayer/fastFM/issues/44 .
That's why I asked this question and I was wondering if implementing customized fit *and *predict_proba > by myself can be a plausible solution for this problem?
I think it's better to fix this clone issue first, maybe that fixes your problem too. I'll look into it but it might take a while. Your example doesn't run. You could just adapt the code from http://scikit-learn.org/stable/auto_examples/plot_multilabel.html to create a self contained example.
@ibayer I tried to solve that clone problem we faced and the problem was in __init__()
function in sgd.py , I just try to feed the parameter values manually I am afraid if it is a efficient way to do so !
But my question is still if FMClassifier is able to solve a problem with huge number of classes like 1000-100000 classes using OneVsRestClassifier
? because I still have the following error:
sklearn.utils.validation.NotFittedError: This OneVsRestClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
I had make a change in _validate_class_labels(y)
in base.py file and I set the assert len(set(y))
equal to my maximum number of classes! so I was afraid if this method can really work with multi-class classification using OneVsRestClassifier
?
I was able to get the OneVsRestClassifier
working with the following (not terribly elegant) patching:
from fastFM import als
class FMClassifier(als.FMClassification):
def fit(self, X, y, *args):
y = y.copy()
y[y == 0] = -1
return super(FMClassifier, self).fit(X, y, *args)
def predict_proba(self, X):
probs = super(FMClassifier, self).predict_proba(X)
return np.tile(probs, 2).reshape(2, probs.shape[0]).T
from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(FMClassifier(n_iter=500, random_state=42), n_jobs=-1)
Basically, scikit produces [0, 1] labels which must be converted to [-1, 1] labels. Then it extracts the probabilities from what it assumes is a (n_instances, 2) array. Broadcasting the values would be better than my tiling solution, but I didn't know the syntax off the top of my head. There may be other changes needed to make the API fully compatible; I haven't tested this with a pipeline.
I want to use fm to solve a multi-class classification problem too,Can you give some advise? @farimahfanaei @ibayer @macks22