kmcuda icon indicating copy to clipboard operation
kmcuda copied to clipboard

Feature Request: K-Prototypes or similar implementation for clustering mixed data

Open plutonium-239 opened this issue 4 years ago • 0 comments

I would love to have an optimised (or at least CUDA) implementation of the K-Prototypes algorithm (package that I use: kmodes, since a lot of data science deals with categorical data, and it would be great if I don't have to use TargetEncoders or worse, pd.get_dummies() for categorical data with a lot of categories. Right now, the solution that I use is using a TargetEncoder on the categorical variables and then using the kmeans/knn in this package, which I feel is a little 'fix'-ey, because of numerical data being continuous and having some relations, whereas it is not necessary for the categorical variables to have any relations (greater than/less than)

plutonium-239 avatar Apr 17 '21 14:04 plutonium-239