category_encoders
category_encoders copied to clipboard
Add argument for alternative indexing of OrdinalEncoder
Expected Behavior
There are a variety of applications in which zero-indexing would be preferred for the OrdinalEncoder. One example is preparing features for a PyTorch model with categorical embeddings, in which case the ordinal label is used to slice dimensions of an embedding matrix. Note also that the sklearn OrdinalEncoder is zero-indexed.
One could possibly add an argument to init() that specifies the indexing (e.g., self.index_start), so that the ordinal_encoding() method can do something like:
data = pd.Series(index=index, data=range(self.index_start, len(index) + self.index_start))
Actual Behavior
The ordinal_encoding() method imposes one-indexing in this line: data = pd.Series(index=index, data=range(1, len(index) + 1))
Specifications
- Version: 2.2.2