juliasilge.com
juliasilge.com copied to clipboard
Dimensionality reduction for #TidyTuesday Billboard Top 100 songs | Julia Silge
Dimensionality reduction for #TidyTuesday Billboard Top 100 songs | Julia Silge
Songs on the Billboard Top 100 have many audio features. We can use data preprocessing recipes to implement dimensionality reduction and understand how these features are related.
Great tutorial, But I wonder that Could we interpret the result of PCA more? I saw it just describes the relationship of predictors in a PCA.
For anyone interested in UMAP's limitations as discussed in the linked twitter thread, I think Dmitry Kobak's contrary view to Lior Pachter on this matter deserves a lot of attention.
(Big old disclaimer: I wrote the 'uwot' package that 'embed' uses for its UMAP implementation and lucked into being a co-author on the UMAP paper so I am not exactly an impartial observer on this one!)
That is great @jlmelville; thanks so much for sharing this, and also for your work on uwot. 🙌
Great post as always Julia. Thanks for sharing.
Also wanted to share my approach on understanding correlations on exploratory analysis which I find way easier to interpret, detect and understand. Using lares::corr_var
and same variables used in this post, you'd get something like this.
https://i.ibb.co/ZHDqGRp/Screen-Shot-2021-09-20-at-18-57-54.png
Hi Julia, thank you so much for your work, I'm learning so much!! I have a basic question. I am analyzing my data with PLS using this tutorial, but I can't figure out how to retrieve the information on how much variation is explained by the components. Is it possible using the package?
@Emily-Zh-bio We have this implemented for PCA via the tidy()
method; you can check out the code here. We don't have that implemented for PLS, though. If you'd like, you could open an issue on the recipes repo about this as a new feature. I do believe the info is in there, if you dig around, something like prepped$steps[[your_step_number]]$res$sd
so it should be doable!
FYI - to get the PLS graphs to work correctly, I needed to add mixOmics. Otherwise, the chart came out looking like a correlation plot by individual features (not PLS1, PLS2...). code used:
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("mixOmics")
Cheers!
Thanks once again for this clear and useful usecase Julia! Do you think UMAP could be applicable when dealing with categorical data? So to summarize my issue, it's about using MCA rather than PCA with the same objective of reducing the dimensionality.
@LDSTATXPERT I don't believe that UMAP handles categorical data natively, but you could try creating dummy/indicator variables and see how that goes.