jasp-issues
jasp-issues copied to clipboard
[Feature Request]: extend clustering algorithms to take account of categorical data and mixed data
Description
No response
Purpose
Improve the range of data types that can be clustered
Use-case
No response
Is your feature request related to a problem?
Currently mixed categorical data cannot be clustered using Jasp
Is your feature request related to a JASP module?
Machine Learning
Describe the solution you would like
k-prototypes clustering and Gower distances
Describe alternatives that you have considered
No response
Additional context
k-prototypes clustering (Huang) using the clustmixtype package as well as perhaps Gower distances (gower package) and I include a few reviews of the wide variety of other methods.
Ahmad, A., & Khan, S. S. (2019). Survey of State-of-the-Art Mixed Data Clustering Algorithms. IEEE Access, 7, 31883–31902. https://doi.org/10.1109/ACCESS.2019.2903568 Gower, J. C. (1971). A General Coefficient of Similarity and Some of Its Properties. Biometrics, 27(4), 857–871. https://doi.org/10.2307/2528823 Huang, Z. (1998). Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery, 2(3), 283–304. https://doi.org/10.1023/A:1009769707641 Hunt, L., & Jorgensen, M. (2011). Clustering mixed data. WIREs Data Mining and Knowledge Discovery, 1(4), 352–361. https://doi.org/10.1002/widm.33 McParland, D., & Gormley, I. C. (2016). Model based clustering for mixed data: clustMD. Advances in Data Analysis and Classification, 10(2), 155–169. https://doi.org/10.1007/s11634-016-0238-x Szepannek, G. (2018). clustMixType: User-Friendly Clustering of Mixed-Type Data in R. The R Journal, 10(2), 200–208. van de Velden, M., Iodice D’Enza, A., & Markos, A. (2019). Distance-based clustering of mixed data. WIREs Computational Statistics, 11(3), e1456. https://doi.org/10.1002/wics.1456