shapash
shapash copied to clipboard
Compilation of different preprocessing methods
Hi, I've just checked out Shapash. I've seen a lot of this line in the document:
preprocessing=encoder, # Optional: compile step can use inverse_transform method
However, I'm not sure how to process with this. I checked the code in here, but I'm not clear of about the use of parsing dict
or list_of_dict
to preprocessing
.
I have this example, could you please advise me how to process with it?
Original df:
A B1 B2 C1 C2 E
1 0 B11 B03 C02 C04 1
2 1 B03 B04 C03 C04 1
3 0 B02 B03 C02 C02 1
4 1 B04 B03 C02 C03 0
I want to one hot encode A and E, and multi label binarizer (B1, B2) and (C1, C2) (both encoders are from sklearn)
Target df:
A0 A1 B02 B03 B04 B11 C02 C03 C04 E0 E1
1 1 0 0 1 0 1 1 0 1 0 1
2 0 1 0 1 1 0 0 1 1 0 1
3 1 0 1 1 0 0 2 0 0 0 1
4 0 1 0 1 1 0 1 1 0 1 0
Because I have multiple encoders of multiple columns, how should I pass them preprocessing
?
Thank you very much
Hi,
I recommend you to take a look at the encoding tutorials for a better understanding tutorial.
But at the moment we don't support multi label binarizer from sklearn.
We support : from sklearn : OneHotEncoder / OrdinalEncoder / StandardScaler / QuantileTransformer / PowerTransformer from category_encoder : OneHotEncoder / OrdinalEncoder / BaseNEncoder / BinaryEncoder / TargetEncoder or a dict with the mapping needed
I don't know how complex your problem is but maybe you can use the features_groups of the compile step to get the importance of A,B,C or E.