[MRG] Keep order of variables in LabelEncoder

Open samuelduchesne opened this issue 3 years ago • 4 comments

The problem

The current behavior of the LabelEncoder is to sort the variables when the mapping is performed. This happens because of the use of np.unique which returns a sorted array of unique values: See https://github.com/Elementa-Engineering/scikit-optimize/blob/master/skopt/space/transformers.py#L175-L177 and https://numpy.org/doc/stable/reference/generated/numpy.unique.html

For example:

>>> from skopt.space.space import Categorical

>>> c = Categorical(("c", "b", "a"), transform="label")
>>> c.transform(["a", "b", "c"])
[0, 1, 2]

Note that the returned labels are 0, 1 and 2 (equivalent to ("a", "b", "c") even if the specified order was ("c", "b", "a")). This can be counter-intuitive, especially when the order of the variable "means" something for the user.

Implemented Fix

This PR, implements a simple fix, which retains the order of the categorical dimensions. The expected behavior then becomes:

>>> from skopt.space.space import Categorical

>>> c = Categorical(("c", "b", "a"), transform="label")
>>> c.transform(["a", "b", "c"])
[2, 1, 0]

The order is conserved.

Same goes for numerical numbers:

from skopt.space.space import Categorical

c = Categorical((10, 30, 20), transform="label")
c.transform([10, 20, 30])
[0, 2, 1]

Sep 14 '21 20:09 samuelduchesne

@kernc, not sure why CI didn't fire up here, but this is ready for a review. :)

Sep 16 '21 13:09 samuelduchesne

Try to push a commit again to trigger the CI, perhaps ?

Oct 01 '21 10:10 QuentinSoubeyran

Try to push a commit again to trigger the CI, perhaps ?

Still not working! Weird!

Oct 01 '21 13:10 samuelduchesne

Well, I'm at a loss here... Have you run the tests locally using pytest ? Maybe the CI is straigh crashing on this PR, hence no info ?

Oct 01 '21 14:10 QuentinSoubeyran

scikit-optimize scikit-optimize copied to clipboard

[MRG] Keep order of variables in LabelEncoder

The problem

Implemented Fix

scikit-optimize
scikit-optimize copied to clipboard