category_encoders
category_encoders copied to clipboard
Handle missing in one hot encoder
Expected Behavior
Currently, handle_missing=value adds a new column although the documentation says 'value' will encode a new value as 0 in every dummy column.
Furthermore, we need a test for this
Actual Behavior
adds a column instead of using all 0
Steps to Reproduce the Problem
from category_encoders import OneHotEncoder
import pandas as pd
he = OneHotEncoder(handle_missing="value")
data = [("foo", 1), ("bar", 2), (None, 6)]
data = pd.DataFrame(data, columns=["c1", "c2"])
print(he.fit_transform(data))
Specifications
- Version: 2.6
- Platform: linux
Would this replace the new "ignore" from #396?
I would expect this to be the correct behavior; is the added column a longstanding behavior, or perhaps a regression that wasn't caught in testing?
Oh you're right. I missed this when adding the ignore option. Thanks for pointing out.
not sure about the naming though... we have the option value to put in "some value that makes sense" in most encoders. So it makes sense for people familiar with the library, ignore on the other hand is more telling
@PaulWestenthanner I can take this if no one else has!