shapash icon indicating copy to clipboard operation
shapash copied to clipboard

Compilation of different preprocessing methods

Open NTNguyen13 opened this issue 3 years ago • 1 comments

Hi, I've just checked out Shapash. I've seen a lot of this line in the document: preprocessing=encoder, # Optional: compile step can use inverse_transform method

However, I'm not sure how to process with this. I checked the code in here, but I'm not clear of about the use of parsing dict or list_of_dict to preprocessing.

I have this example, could you please advise me how to process with it?

Original df:

   A   B1   B2   C1   C2   E
1  0   B11  B03  C02  C04  1
2  1   B03  B04  C03  C04  1
3  0   B02  B03  C02  C02  1
4  1   B04  B03  C02  C03  0

I want to one hot encode A and E, and multi label binarizer (B1, B2) and (C1, C2) (both encoders are from sklearn)

Target df:

    A0   A1   B02  B03  B04  B11  C02  C03  C04  E0  E1
1   1    0    0    1    0    1    1    0    1    0   1
2   0    1    0    1    1    0    0    1    1    0   1
3   1    0    1    1    0    0    2    0    0    0   1
4   0    1    0    1    1    0    1    1    0    1   0

Because I have multiple encoders of multiple columns, how should I pass them preprocessing?

Thank you very much

NTNguyen13 avatar Feb 25 '22 11:02 NTNguyen13

Hi,

I recommend you to take a look at the encoding tutorials for a better understanding tutorial.

But at the moment we don't support multi label binarizer from sklearn.

We support : from sklearn : OneHotEncoder / OrdinalEncoder / StandardScaler / QuantileTransformer / PowerTransformer from category_encoder : OneHotEncoder / OrdinalEncoder / BaseNEncoder / BinaryEncoder / TargetEncoder or a dict with the mapping needed

I don't know how complex your problem is but maybe you can use the features_groups of the compile step to get the importance of A,B,C or E.

SebastienBidault avatar Feb 28 '22 23:02 SebastienBidault