category_encoders icon indicating copy to clipboard operation
category_encoders copied to clipboard

Add argument for alternative indexing of OrdinalEncoder

Open wildcat47 opened this issue 4 years ago • 0 comments

Expected Behavior

There are a variety of applications in which zero-indexing would be preferred for the OrdinalEncoder. One example is preparing features for a PyTorch model with categorical embeddings, in which case the ordinal label is used to slice dimensions of an embedding matrix. Note also that the sklearn OrdinalEncoder is zero-indexed.

One could possibly add an argument to init() that specifies the indexing (e.g., self.index_start), so that the ordinal_encoding() method can do something like:

data = pd.Series(index=index, data=range(self.index_start, len(index) + self.index_start))

Actual Behavior

The ordinal_encoding() method imposes one-indexing in this line: data = pd.Series(index=index, data=range(1, len(index) + 1))

Specifications

  • Version: 2.2.2

wildcat47 avatar Feb 11 '21 21:02 wildcat47