fastFM icon indicating copy to clipboard operation
fastFM copied to clipboard

can "OneVsRestClassifier" be usefull to tune FMclassification into a multi-class classification case

Open farimahfanaei opened this issue 8 years ago • 8 comments

As fastFM-core only accept {1,-1} as class labels and there is no predict_proba or decision_function I was wondering if it can fully satisfy the expectations of OneVsRestClassifierI? If so is there any tip or solution so that I can apply sgd-FMclassification in multi-class problem?

farimahfanaei avatar Apr 19 '16 23:04 farimahfanaei

Actually there is a predict_proba function for the als and sdg solver but it's not showing up in the docs (there is an open issue now). In general, I would recommend to use the mcmc solver with the fit_predict_proba function.

#47

ibayer avatar Apr 20 '16 08:04 ibayer

thank you so much,I was thinking to try implementing a customized predic_proba and 'fit' functions for sgd which can solve the problem but you said that there is an issue! Although I preferred to use sgd method but I also tried it withmcmcsolver as following: y_proba =OneVsRestClassifier(fm).fit_predict_proba(X_train, y_train, X_test) but I face the following error:

AttributeError: 'OneVsRestClassifier' object has no attribute 'fit_predict_proba'

farimahfanaei avatar Apr 21 '16 08:04 farimahfanaei

You are talking about the OneVsRestClassifierI class from sklearn? I thought you want to implement it yourself. Please provide a Short, Self Contained, Correct Example to help us understand the issue.

ibayer avatar Apr 21 '16 08:04 ibayer

yes, I am facing a multi-class classification problem and I was thinking to use OneVsRestClassifier from sklearn for tuning sgd-FMclassification to solve my multi-class classification problem using FastFM method .

fm = sgd.FMClassification(n_iter=1000, init_stdev=0.1, rank=2, random_state=123,l2_reg_w=0, l2_reg_V=0, l2_reg=0, step_size=0.1)

OneVsRestClassifier(fm).fit(X_train,y_train)

with the above code I face the following error related to sklearn.base :

RuntimeError: Cannot clone object FMClassification(init_stdev=0.1, l2_reg=None, l2_reg_V=0, l2_reg_w=0, n_iter=1000, random_state=123, rank=2, step_size=0.1), as the constructor does not seem to set parameter l2_reg_V

I thought maybe it is because of the fact that sgd (or als) solver has not a proper decision_function for OneVsRestClassifier !? That's why I asked this question and I was wondering if implementing customized **fit _and _predict_proba by myself can be a plausible solution for this problem?

farimahfanaei avatar Apr 21 '16 09:04 farimahfanaei

The Cannot clone object FMClassification error has been reported before https://github.com/ibayer/fastFM/issues/44 .

That's why I asked this question and I was wondering if implementing customized fit *and *predict_proba > by myself can be a plausible solution for this problem?

I think it's better to fix this clone issue first, maybe that fixes your problem too. I'll look into it but it might take a while. Your example doesn't run. You could just adapt the code from http://scikit-learn.org/stable/auto_examples/plot_multilabel.html to create a self contained example.

ibayer avatar Apr 21 '16 11:04 ibayer

@ibayer I tried to solve that clone problem we faced and the problem was in __init__()function in sgd.py , I just try to feed the parameter values manually I am afraid if it is a efficient way to do so ! But my question is still if FMClassifier is able to solve a problem with huge number of classes like 1000-100000 classes using OneVsRestClassifier ? because I still have the following error:

sklearn.utils.validation.NotFittedError: This OneVsRestClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

I had make a change in _validate_class_labels(y) in base.py file and I set the assert len(set(y)) equal to my maximum number of classes! so I was afraid if this method can really work with multi-class classification using OneVsRestClassifier ?

farimahfanaei avatar Aug 12 '16 13:08 farimahfanaei

I was able to get the OneVsRestClassifier working with the following (not terribly elegant) patching:

from fastFM import als
class FMClassifier(als.FMClassification):
    def fit(self, X, y, *args):
        y = y.copy()
        y[y == 0] = -1
        return super(FMClassifier, self).fit(X, y, *args)

    def predict_proba(self, X):
        probs = super(FMClassifier, self).predict_proba(X)
        return np.tile(probs, 2).reshape(2, probs.shape[0]).T

from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(FMClassifier(n_iter=500, random_state=42), n_jobs=-1)

Basically, scikit produces [0, 1] labels which must be converted to [-1, 1] labels. Then it extracts the probabilities from what it assumes is a (n_instances, 2) array. Broadcasting the values would be better than my tiling solution, but I didn't know the syntax off the top of my head. There may be other changes needed to make the API fully compatible; I haven't tested this with a pipeline.

macks22 avatar Mar 20 '17 00:03 macks22

I want to use fm to solve a multi-class classification problem too,Can you give some advise? @farimahfanaei @ibayer @macks22

Darinyazanr avatar Mar 29 '18 03:03 Darinyazanr