category_encoders icon indicating copy to clipboard operation
category_encoders copied to clipboard

Handle missing in one hot encoder

Open PaulWestenthanner opened this issue 2 years ago • 3 comments

Expected Behavior

Currently, handle_missing=value adds a new column although the documentation says 'value' will encode a new value as 0 in every dummy column. Furthermore, we need a test for this

Actual Behavior

adds a column instead of using all 0

Steps to Reproduce the Problem

from category_encoders import OneHotEncoder
import pandas as pd

he = OneHotEncoder(handle_missing="value")

data = [("foo", 1), ("bar", 2), (None, 6)]
data = pd.DataFrame(data, columns=["c1", "c2"])
print(he.fit_transform(data))

Specifications

  • Version: 2.6
  • Platform: linux

PaulWestenthanner avatar Mar 12 '23 12:03 PaulWestenthanner

Would this replace the new "ignore" from #396?

I would expect this to be the correct behavior; is the added column a longstanding behavior, or perhaps a regression that wasn't caught in testing?

bmreiniger avatar Mar 21 '23 13:03 bmreiniger

Oh you're right. I missed this when adding the ignore option. Thanks for pointing out.
not sure about the naming though... we have the option value to put in "some value that makes sense" in most encoders. So it makes sense for people familiar with the library, ignore on the other hand is more telling

PaulWestenthanner avatar Mar 24 '23 09:03 PaulWestenthanner

@PaulWestenthanner I can take this if no one else has!

lazarust avatar Oct 16 '23 23:10 lazarust