feature_engine icon indicating copy to clipboard operation
feature_engine copied to clipboard

Ordinal encoder outputs -1 for unknown Categories

Open datacubeR opened this issue 2 years ago • 4 comments

This is my shot for #428. I noticed OrdinalEncoder inherits the transform method from CategoricalMethodsMixin. So I added an additional condition to output -1 only for OrdinalEncoder.

Additionally I had to fix test_error_if_input_df_contains_categories_not_present_in_training_df in order for raise an error in case errors='raise' and check correctness if errors='ignore'.

datacubeR avatar Aug 13 '22 05:08 datacubeR

I'm not sure what happened with this branch. My fork is up to date but for some reason it accounts changes in the previous merged commit. I will check what happened here...

datacubeR avatar Aug 13 '22 18:08 datacubeR

FYI #502

solegalli avatar Aug 18 '22 10:08 solegalli

Hey @solegalli, I don't understand very well if this is already implemented or are there some things to be implemented here? Any way, I would like to keep contributing to Feature Engine so if I can help in this PR or any other I would be happy to help.

datacubeR avatar Sep 03 '22 01:09 datacubeR

Hi @datacubeR

In this PR, we want the OrdinalEncoder to output -1 for unseen categories. It is not implemented yet.

We did something similar for the CountFrequencyEncoder, where we made it output 0 for unseen categories.

My suggestion was that you used the implementation in CountFrequencyEncoder as template for this PR.

Also, since this PR was first made, we made a few structural changes to the main source code, so that imports changed slightly. So it needs rebasing.

It would be super useful if you could finish this PR first, which should include very few code changes. And then I would be more than happy to tag you in a new PR :)

Thanks a lot!

solegalli avatar Sep 04 '22 13:09 solegalli

functionality added in #539

solegalli avatar Oct 13 '22 10:10 solegalli