ml_preprocessing
ml_preprocessing copied to clipboard
Using Encoder.oneHot like scikit-learn LabelBinarizer
First of all thanks for this nice package! But sadly I am already stuck at the beginning.
I am trying to use the Encoder.oneHot like the LabelBinarizer from scikit-learn. But I am not sure how to achieve that if it is even possible.
What I want is basically this:
# create an oneHotEncoder for my labels
y = ["a", "b", "c", ...] # the labels i want to one hot encode
lb = LabelBinarizer()
lb.fit(y)
o_y = lb.transform(y)
# inference of CNN
...
# use the encoder on a prediction of a CNN to get the label (string) of the class
prediction = lb.inverse_transform(predicted)
The Encoder.oneHot forces me to provide a dataFrame instance to the constructor. However from the README it is not clear to me how that dataFrame should look like (also could you please update the link to the black friday data set).
Your help would be highly appreciated!
@CaptainDario Thank you for creating the issue! Indeed, there are too few words in the README about encoding, I'd recommend you to look at live example Although a different encoder is used there, the key idea is the same - encoders from this lib infer labels from the provided data on their own, that's why you need to provide data first (using DataFrame). I suppose, it would be a good idea to add the ability to provide labels directly to encoders, I'll consider this in future updates of the lib
@CaptainDario And regarding the additional info in README - I got your point, It's really needed to add some words on encoding + I'll fix the link
Thank you for your quick help.
If I understand that right I need to create a dataframe with a feature containing all my values like this:
DataFrame([
["My Feature"],
["a"],
["b"],
["c"],
...,
["z"]
])
and than the created encoder will be able to convert new instances back to the label, right?
Okay, I tried the above approach and it seems to be working.
However the application crashes if the optional parameter featureNames is not given. Maybe it would be good to encode all labels/features if the parameter is unset.
But does an encoder provide a method to reverse the oneHot encoding something like unprocess which takes a DataFrame like
final dataFrame = DataFrame([
["character"], ["a"], ["b"], ["c"], ["d"],
]);
final encoder = Encoder.oneHot(dataFrame, featureNames: ["character"]);
final prediction = DataFrame([[0], [0], [0], [1]]);
final decoded = encoder.unprocess(prediction);
And decoded now contains the value "d".
That would be really helpful.
@CaptainDario thank you very much for such a precious feedback, I'll consider adding this functionality to the lib. Do you have any more problems with the package?
Otherwise the package seems to be doing exactly what I want. Thank you!
Because I need something like an unprocess method for progressing with my app, I will try to implement it for the encoder.oneHot.
Do you think adding unprocess to encoder_impl.dart would be suitable?
@CaptainDario I need to think it over, unprocess sounds a bit unclear for me.