Soledad Galli

Results 354 comments of Soledad Galli

Thank you so much for the detailed explanation @PeterPirog I've not come across with this type of analysis before. I reckon you need the specific label names for the analysis,...

If we were to make this change, I would prefer to leave the functionality of `q` as it is now, and add an additional parameter, called `labels`, that defaults to...

Hi @Morgan-Sell The iterativeImputer will return a continuous value to impute NA. But some variables are categorical, so instead of regression, classification would be more suitable. Nan are handle during...

By any chance, is there a link to an article? I've heard of this manipulation for categorical variables before, so I was wondering if we could gather a few more...

> I've not found any valuable reference to this method, I think [KBinsDiscretizer](https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-discretization) is similar but in this case the output is only 1 dimensional (the cluster label). This might...

Thanks for adding in the reference to the book. The author (of the book you linked in previous comment) mentions that K-mean featurization makes sense on variables where the euclidean...

Reading the book Feature Engineering for Machine Learning from Alice Zhang further Bin counting is in essence target mean encoding, as per example number of clicks / sum (click +...

Hi @GLevV I didn't check the source code of the KBinsDiscretizer when strategy is kmeans, but according to the docs: **"β€˜kmeans’ strategy defines bins based on a k-means clustering procedure...

Hey @pgschr this is a very important question, and we get it a lot. We should measure it. At the moment, we are focusing our strength on the next release,...

@pgschr by any chance, did you test this? Sounds like you are in a good position to test speed of feature-engine on your dataset, and maybe even compare with the...