category_encoders icon indicating copy to clipboard operation
category_encoders copied to clipboard

FutureWarning in ordinal encoder when downcasting objects

Open eangius opened this issue 1 year ago • 2 comments

Expected Behavior

No FutureWarning is thrown.

Actual Behavior

Currently the following warning is thrown.

category_encoders/ordinal.py:198: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  X[column] = X[column].astype("object").fillna(np.nan).map(col_mapping)

Neither suppressing warnings, setting the pandas option or changing the types on caller side is sufficient for correctness.

Steps to Reproduce the Problem

  1. create data frame with object dtype.
  2. fit data frame to CountEncoder (or similar)
  3. notice the warning

Specifications

  • Version: 2.6.3

eangius avatar Jun 25 '24 16:06 eangius

For what it's worth, these local changes fixed things for me & kept tests passing. If anyone is willing to officialize this it'll be much appreciated.

diff --git a/category_encoders/ordinal.py b/category_encoders/ordinal.py
index 45d333e..94804c0 100644
--- a/category_encoders/ordinal.py
+++ b/category_encoders/ordinal.py
@@ -195,7 +195,7 @@ class OrdinalEncoder(util.BaseEncoder, util.UnsupervisedTransformerMixin):
 
                 # Convert to object to accept np.nan (dtype string doesn't)
                 # fillna changes None and pd.NA to np.nan
-                X[column] = X[column].astype("object").fillna(np.nan).map(col_mapping)
+                X[column] = X[column].astype("object").infer_objects(copy=False).fillna(np.nan).map(col_mapping)
                 if util.is_category(X[column].dtype):
                     nan_identity = col_mapping.loc[col_mapping.index.isna()].array[0]
                     X[column] = X[column].cat.add_categories(nan_identity)

eangius avatar Jun 25 '24 16:06 eangius

Thanks for reporting!

Your proposed fix seems fine, but I wonder whether something else might be better. The cast to object is just there (according to the comment) to accommodate np.nan as the fill, and we're about to map to numeric, so the dtype itself isn't critical information, and downcasting in particular isn't needed. Should we just opt in to the future behavior?

bmreiniger avatar Jun 26 '24 12:06 bmreiniger