DESlib icon indicating copy to clipboard operation
DESlib copied to clipboard

Multi-label classification

Open jayahm opened this issue 2 years ago • 10 comments

Hi

Can this library and its methods work with multi-label classification algorithms?

jayahm avatar May 17 '22 09:05 jayahm

@jayahm Hello,

I haven't tested it yet but it should work well with the multi-output package from scikit-learn which transform a general estimator into a multi-label classification (or regression) algorithm: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.multioutput

so in this case it would be used together with the ClassifierChain or MultiOutputClassifier methods.

Menelau avatar May 24 '22 23:05 Menelau

Hi

I have tested with ClassifierChain. I got the following errors:

y_self has the shape of (200, 6918), where 6918 is the number of labels (0-1 binarized).

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [62], in <cell line: 3>()
      1 from deslib.dcs import OLA
      2 ola = OLA(pool_classifiers)
----> 3 ola.fit(X_val, y_val)
      4 ola_prediction = ola.predict(X_test, y_test)

File ~\anaconda3\lib\site-packages\deslib\base.py:207, in BaseDS.fit(self, X, y)
    204 self.random_state_ = check_random_state(self.random_state)
    206 # Check if the length of X and y are consistent.
--> 207 X, y = check_X_y(X, y)
    209 # Check if the pool of classifiers is None.
    210 # If yes, use a BaggingClassifier for the pool.
    211 if self.pool_classifiers is None:

File ~\anaconda3\lib\site-packages\sklearn\utils\validation.py:63, in _deprecate_positional_args.<locals>._inner_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
     61 extra_args = len(args) - len(all_args)
     62 if extra_args <= 0:
---> 63     return f(*args, **kwargs)
     65 # extra_args > 0
     66 args_msg = ['{}={}'.format(name, arg)
     67             for name, arg in zip(kwonly_args[:extra_args],
     68                                  args[-extra_args:])]

File ~\anaconda3\lib\site-packages\sklearn\utils\validation.py:826, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
    823     y = check_array(y, accept_sparse='csr', force_all_finite=True,
    824                     ensure_2d=False, dtype=None)
    825 else:
--> 826     y = column_or_1d(y, warn=True)
    827     _assert_all_finite(y)
    828 if y_numeric and y.dtype.kind == 'O':

File ~\anaconda3\lib\site-packages\sklearn\utils\validation.py:63, in _deprecate_positional_args.<locals>._inner_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
     61 extra_args = len(args) - len(all_args)
     62 if extra_args <= 0:
---> 63     return f(*args, **kwargs)
     65 # extra_args > 0
     66 args_msg = ['{}={}'.format(name, arg)
     67             for name, arg in zip(kwonly_args[:extra_args],
     68                                  args[-extra_args:])]

File ~\anaconda3\lib\site-packages\sklearn\utils\validation.py:864, in column_or_1d(y, warn)
    858         warnings.warn("A column-vector y was passed when a 1d array was"
    859                       " expected. Please change the shape of y to "
    860                       "(n_samples, ), for example using ravel().",
    861                       DataConversionWarning, stacklevel=2)
    862     return np.ravel(y)
--> 864 raise ValueError(
    865     "y should be a 1d array, "
    866     "got an array of shape {} instead.".format(shape))

ValueError: y should be a 1d array, got an array of shape (200, 6918) instead.

jayahm avatar Jun 05 '22 06:06 jayahm

Hello,

Can you provide me with a small code example you used to get this error? Then, I can what can be done.

Menelau avatar Jun 06 '22 03:06 Menelau

Hi

Thanks for your response.

I have created s simple code here

https://www.dropbox.com/s/soaysxi2rhhj388/for_deslib.zip?dl=0

jayahm avatar Jun 06 '22 13:06 jayahm

Hi

Were you able to run my code?

I really hope there is a way to perform multi-label classification using this library.

jayahm avatar Jun 17 '22 01:06 jayahm

@jayahm Hello,

According to the example you provided, you want each base model to be a multilabel classifier and select the best between them according to each new sample correct? If that is the case there is no support for that. To the best of my knowledge, there is no dynamic ensemble technique that performs classifier selection of multi-label models. So we would need to develop a new technique first and then add it as it would involve multiple adaptations to this context in multiple steps in the pipeline (region of competence definition, competence estimation, selection scheme, and combination). I spent some time looking if there exists any technique in the literature by did not find any, so there is a huge potential for interesting research there...

If what you want is just to have a usual, classical DS technique (which works as single label classifier) that is transformed to perform multilabel classification with classical techniques that makes multi-label decomposition (binary relevance or classifier chain) you can just use like that:

`from sklearn.datasets import make_multilabel_classification from deslib.des import KNORAE from sklearn.model_selection import train_test_split from sklearn.multioutput import ClassifierChain X, Y = make_multilabel_classification(n_samples=1000, n_classes=5, random_state=0) X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state=0)

knorae = KNORAE(random_state=42) chain = ClassifierChain(knorae, order='random', random_state=0) chain.fit(X_train, Y_train).predict(X_test) chain.predict_proba(X_test)`

Menelau avatar Jun 17 '22 03:06 Menelau

According to the example you provided, you want each base model to be a multilabel classifier and select the best between them according to each new sample correct?

Yes, very true.

I spent some time looking if there exists any technique in the literature by did not find any, so there is a huge potential for interesting research there...

Yes, I couldn't find too actually. I could feel from the beginning that this task is not straightforward since we need to define many things in the context of multi-label classification (region of competence definition, competence estimation, selection scheme, combination, etc). The main reason might be, for example, that a sample can have 3 labels, while another sample can have 5 labels. So, I am not sure how that can be adapted to this library.

If what you want is just to have a usual, classical DS technique (which works as a single-label classifier) that is transformed to perform multilabel classification with classical techniques that makes multi-label decomposition (binary relevance or classifier chain) you can just use like that:

Do you mean to first train multiple single-label classifiers as base classifiers (pool_classifiers) and apply KNORAE as ClassifierChain?

pool_classifiers = [model_perceptron,
                    model_svc,
                    model_bayes,
                    model_tree,
                    model_knn]

knorae = KNORAE(pool_classifiers, random_state=42)
knorau.fit(X_dsel, y_dsel)

chain = ClassifierChain(knorae, order='random', random_state=0)

chain.fit(X_train, Y_train).predict(X_test)
chain.predict_proba(X_test)`

jayahm avatar Jun 17 '22 04:06 jayahm

Hi

I tried your suggestion but using a heterogeneous pool of classifiers. I used the code I wrote above.

It seems like in order to train each classifier, it still needs a single-label dataset.

I think the code you suggested previously will generate bagging classifiers, right? Or, what are the base classifiers of that KNORAE you suggested?

jayahm avatar Jun 23 '22 07:06 jayahm

Yeah, it would generate a bagging classifier. Unfortunately to use a heterogenous one the current implementation does not allow due to some limitations in how scikit-learn clone classifiers (issue #89 ). I have a workaround in mind but it will take some time to have everything compatible with both libraries.

However I just saw there is a quite recent paper (published on june 20th) that proposes a DES method for multi-label classification: [(https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4145875] ) I will see if I can get their original code and add it to this library.

Menelau avatar Jun 29 '22 17:06 Menelau

Hi @Menelau

That sounds good. I'll check the mentioned paper. Thanks for sharing.

Hopefully, deslib will capable of handling multi-label classification soon.

jayahm avatar Jul 03 '22 17:07 jayahm