juliasilge.com
juliasilge.com copied to clipboard
Dimensionality reduction of #TidyTuesday United Nations voting patterns | Julia Silge
Dimensionality reduction of #TidyTuesday United Nations voting patterns | Julia Silge
Explore country-level UN voting with a tidymodels approach to unsupervised machine learning.
Quick question: how do you reconcile applying PCA reduction to discrete data like in this case (or for word counts/one hot/etc)? I thought that if your variables don't belong on a coordinate plane, then do not apply PCA to them.
I've been struggling with this concept especially with unsupervised learning with text data and sparse data. Are there any plans to bring in other unsupervised methods into recipes a la clusterR, cluster, factormineR? If not, are there any methods that you like to use yourself when doing EDA/unsupervised leaning?
Oh, you definitely have a point about whether this is the best data for something like PCA; take a look at the plot showing which roll call votes contribute to the principal components and notice how the values are all sort of the same. I don't know that you can never take something like indicator variables and do dimensionality reduction, though.
There are quite a number of unsupervised methods available in recipes and the recipes-adjacent packages like embed, including ones specifically for categorical data.
When it comes to clustering specifically, we are gathering thoughts and community feedback in this planning repo PR.
I learn something every time. Thank you for this fantastic content!
Thanks pls. Don't stop doing what you are doing ! I love it !!