scikit-learn Bad error messages in ClassifierChain on multioutput multiclass

trying to run classifier chain on a multiclass problem with a reshaped y (which is interpreted as multioutput multiclass) doesn't work (see #9245), which is fine. But the error messages are really bad making it hard to understand what's going on.

We should detect that we're trying to do multioutput multiclass instead of multilabel and give an informative error.

Feb 28 '19 16:02 amueller

Hey Im pretty new, can I take a look at this and try to fix it?

Feb 28 '19 21:02 EdinCitaku

This should reproduce the issue:

from sklearn.multioutput import ClassifierChain
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

Xs, ys = zip(*(make_classification(random_state=i, n_classes=3, n_informative=3) for i in range(3)))
X = np.hstack(Xs)
Y = np.transpose(ys)
ClassifierChain(LogisticRegression()).fit(X, Y).predict(X)

but actually this is running without complaint for me!!

@amueller, please provide a reproducible snippet??

Mar 01 '19 06:03 jnothman

ah, it's only decision_function that's broken. If predict works, maybe fixing decision_function is reasonable? I didn't actually read #9245 in detail: it looks like the predict part is already done in master?

from sklearn.multioutput import ClassifierChain
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
import numpy as np

Xs, ys = zip(*(make_classification(random_state=i, n_classes=3, n_informative=3) for i in range(3)))
X = np.hstack(Xs)
Y = np.transpose(ys)
ClassifierChain(LogisticRegression()).fit(X, Y).decision_function(X)

Mar 01 '19 23:03 amueller

Maybe we should try fixing decision_function and predict_proba and if that's too hard we should raise attribute errors in the multioutput multiclass case?

Mar 01 '19 23:03 amueller

In case of raising an attribute error, is it enough to add an if statement, that calls the type_of_target function of Y? Something along the line of: if type_of_target(Y_decision_chain) == 'multiclass-multioutput': raise AttributeError("decision_function currently does not support multiclass-multioutput as target type")

Mar 02 '19 20:03 EdinCitaku

It shouldn't be hard to impenetrable support I think. type_of_target will be easiest to apply at fit time.

Mar 03 '19 09:03 jnothman

So what do we need to add for this support? As I understood decision_function expects a 2d-Array when calling estimator.decision_function(X_aug) but since in our case its a multiclass-multioutput target type it gets a list of 2d-arrays instead. Is it suitable to just add more collumns to the variable Y_decision_chain and fill it with the collumns from each 2d-array from the list of 2d-arrays we get?

Mar 04 '19 19:03 EdinCitaku

Something like that. You might need to use the classes_ attribute of the underlying estimators to get the columns lined up when cv might result in a class being missing from one training set or another.

Mar 04 '19 22:03 jnothman

So I looked at it again and made some modifications but Im really not sure if I understood it correctly. This is the code that I changed in the decision_function.

decision_function_result =  estimator.decision_function(X_aug)
if decision_function_result.shape[1] == 1:
    Y_decision_chain[:, chain_idx] = decision_function_result
else:
    Y_decision_chain[:, chain_idx] = decision_function_result[:,chain_idx]

Basicly each multi-output classifier in the Classifierchain is responsible for one row in the Y_decision_chain matrix. I threw away my previous idea of adding rows to Y_decision_chain , since it still need to conform the output rule of the decision_function. Is this solving the problem at hand? I'm sorry if it might be wrong or if I didn't understand you correctly. I dont know how else to use the decision_function of each estimator to modify the Y_decision_chain, since each class is being predicted multible times.

Mar 08 '19 00:03 EdinCitaku

Hi, I'm new and looking for an issue to work on. Is this issue still open?

May 14 '20 22:05 michellemroy

@michellemroy there's currently some work being done in #14654

May 29 '20 16:05 amueller

I am new to the repo , is this issue still open?

Mar 04 '21 15:03 sethiabhishek

Hi, can I work on this? I'm new so can someone guide me?

Aug 25 '21 17:08 dhivyasreedhar

Hello, Can I use this issue as my first contribution? I'm new so please can someone guide me?

Aug 29 '21 13:08 ankitasankars

I would like to contribute but we should assign the tickets that are open but appear to be assigned.

Dec 15 '21 19:12 codecypher

If this issue hasn't been resolved already, iId like to contribute! It would be my first open-source contribution

Apr 28 '23 05:04 albonec

Hi everyone, I'm new to the open source world and I want to contribute, please let me know a easy good first issue to start with. It will be really helpful if someone can guide me.

Jun 19 '23 14:06 YashSaxena21

I am working on this!

Oct 05 '23 06:10 santiagoahl

Can anyone confirm is this issue still open or not??

May 09 '24 18:05 PragyanTiwari