feature_engine Feat/binarizer without column transformer

Feat/binarizer without column transformer

Open d-lazenby opened this issue 6 months ago • 3 comments

Issue raised here

Notes on Code The BinaryDiscretiser class is implemented in binariser.py, located with the other discretisers, and takes a parameter threshold to determine where to split the interval.

After standard checks and type checks for threshold, there's a check to see if the threshold is in min(x) < threshold < max(x) for each feature x (L167). If not, then x isn't transformed and the user is notified of this. The remaining features are passed to a list for transformation.
Because of the above, the transform method from the BaseDiscretiser is repeated here, only iterating through the new list of features that passed the threshold check rather than the list in self.variables_. I'm not sure if there's a cleaner way of doing this. We could also modify the self.variables_ attribute directly in the fit method instead, which might make sense since then it would contain only features that were actually transformed, and there would be no need to re-implement the transform method.

Other notes

I updated the docs apart from the user_guide since this might change depending on further changes to the implementation
I've tested on an sklearn Pipeline and it seems to work fine but haven't included explicit tests for that as they were missing for the other discretisers. Let me know if that's something you'd want.
It might be nice to have functionality where the user can pass a set of different thresholds for each feature passed to the class (could be corresponding lists for threshold and variables parameters, or a dictionary of pairs).
The threshold check output is written to stdout at the moment, but this should perhaps be given as a warning instead.

Finally This is my first time contributing to open source – all feedback is very welcome!

Aug 16 '24 06:08 d-lazenby

feature_engine feature_engine copied to clipboard

Feat/binarizer without column transformer

feature_engine
feature_engine copied to clipboard