evalml icon indicating copy to clipboard operation
evalml copied to clipboard

Allow OrdinalEncoder to have unsorted numeric categories

Open tamargrey opened this issue 2 years ago • 0 comments

  • As a user, I wish I could use the Ordinal Encoder to encode Ordinal columns whose categories' orders are not in strictly increasing numeric order.

The following code raises the error ValueError: Unsorted categories are not supported for numerical categories

    X = pd.DataFrame(
        {
            "col_1": [2, 0, 1, 0, 0],
            "col_2": ["a", "b", "a", "c", "d"],
            "col_3": ["x", "x", "x", "y", "y"],
            "col_4": [1, 2, 2, 3, 1],
        },
    )
 # The order is 3, 2, 1 instead of 1, 2, 3
    X.ww.init(logical_types={"col_2": Ordinal(order=["a", "b", "c", "d"]), "col_4": Ordinal(order=[3, 2, 1])})

    encoder = OrdinalEncoder()
    encoder.fit(X)

This isn't a super uncommon situation where numbers have been assigned to ordered categories in the reverse order of the relative sizes of the categories. We should be able to handle that case.

tamargrey avatar Nov 15 '22 22:11 tamargrey