feature_engine
feature_engine copied to clipboard
(In progress) Backoff binning rare labels in CountFrequency encoder
Reference Issues/PRs: Issue : #429 (Add functionality for threshold)
What does this implement/fix? Adds functionality to group categories with few observations.
Hi, @solegalli I've added functionality for Backoff binning, threshold is a list that takes values(threshold) for each feature mentioned and bins them accordingly. I haven't added the tests yet but will add them.
Please let me know any improvements that you think could be done here. I'll implement them.
Isn't this what RareLabelEncoder
is for? If I understand it correctly, this would have the same effect as putting both of them in scikit-learn pipeline - or am I missing some detail?
Isn't this what
RareLabelEncoder
is for? If I understand it correctly, this would have the same effect as putting both of them in scikit-learn pipeline - or am I missing some detail?
yep, it is.
@solegalli I've implemented the changes, I've added transformer and inverse transformer(no functionality) and tests. Please let me know if any changes are to be made.
Thanks.
Hi @solegalli , please find the checklist of suggestions implemented.(made a checklist since the number of changes were more)
- [x] Reverted back transform docstring to base_encoder's transform docstring.
- [x] Remove threshold doc from fit().
- [x] Redo fit() with original logic as per suggestion, reduced duplication of code.
- [x] Redo transform().
- [x] Preserve original layout for fit(),
- [ ] Could not preserve original layout for raising value errors because of flake8 lint errors
- [x] Add alternative explanation for backoff binning to make it easily understandable.
Hi @SangamSwadiK
@david-cortes is right that the logic we are trying to implement here is already done by another transformer, the RareLabelEncoder.
So I am having second thoughts as to whether it is a good idea to make the class more complicated by adding additional logic that is already taken care of elsewhere.
Would you mind if we put this PR on hold and see what the community thinks about this?
Sorry, I know it is frustrating to work on something that does not go through eventually :_(
Hi @SangamSwadiK
@david-cortes is right that the logic we are trying to implement here is already done by another transformer, the RareLabelEncoder.
So I am having second thoughts as to whether it is a good idea to make the class more complicated by adding additional logic that is already taken care of elsewhere.
Would you mind if we put this PR on hold and see what the community thinks about this?
Sorry, I know it is frustrating to work on something that does not go through eventually :_(
yes, its similar to Rarelabel encoder, meanwhile ill take a look at other issues. Thanks !