evalml
evalml copied to clipboard
Allow OrdinalEncoder to have unsorted numeric categories
- As a user, I wish I could use the Ordinal Encoder to encode Ordinal columns whose categories' orders are not in strictly increasing numeric order.
The following code raises the error ValueError: Unsorted categories are not supported for numerical categories
X = pd.DataFrame(
{
"col_1": [2, 0, 1, 0, 0],
"col_2": ["a", "b", "a", "c", "d"],
"col_3": ["x", "x", "x", "y", "y"],
"col_4": [1, 2, 2, 3, 1],
},
)
# The order is 3, 2, 1 instead of 1, 2, 3
X.ww.init(logical_types={"col_2": Ordinal(order=["a", "b", "c", "d"]), "col_4": Ordinal(order=[3, 2, 1])})
encoder = OrdinalEncoder()
encoder.fit(X)
This isn't a super uncommon situation where numbers have been assigned to ordered categories in the reverse order of the relative sizes of the categories. We should be able to handle that case.