juliasilge.com icon indicating copy to clipboard operation
juliasilge.com copied to clipboard

Dimensionality reduction for #TidyTuesday Billboard Top 100 songs | Julia Silge

Open utterances-bot opened this issue 3 years ago • 9 comments

Dimensionality reduction for #TidyTuesday Billboard Top 100 songs | Julia Silge

Songs on the Billboard Top 100 have many audio features. We can use data preprocessing recipes to implement dimensionality reduction and understand how these features are related.

https://juliasilge.com/blog/billboard-100/

utterances-bot avatar Sep 16 '21 02:09 utterances-bot

Great tutorial, But I wonder that Could we interpret the result of PCA more? I saw it just describes the relationship of predictors in a PCA.

nguyenlovesrpy avatar Sep 16 '21 02:09 nguyenlovesrpy

For anyone interested in UMAP's limitations as discussed in the linked twitter thread, I think Dmitry Kobak's contrary view to Lior Pachter on this matter deserves a lot of attention.

(Big old disclaimer: I wrote the 'uwot' package that 'embed' uses for its UMAP implementation and lucked into being a co-author on the UMAP paper so I am not exactly an impartial observer on this one!)

jlmelville avatar Sep 16 '21 08:09 jlmelville

That is great @jlmelville; thanks so much for sharing this, and also for your work on uwot. 🙌

juliasilge avatar Sep 16 '21 15:09 juliasilge

Great post as always Julia. Thanks for sharing. Also wanted to share my approach on understanding correlations on exploratory analysis which I find way easier to interpret, detect and understand. Using lares::corr_var and same variables used in this post, you'd get something like this. https://i.ibb.co/ZHDqGRp/Screen-Shot-2021-09-20-at-18-57-54.png

laresbernardo avatar Sep 21 '21 02:09 laresbernardo

Hi Julia, thank you so much for your work, I'm learning so much!! I have a basic question. I am analyzing my data with PLS using this tutorial, but I can't figure out how to retrieve the information on how much variation is explained by the components. Is it possible using the package?

Emily-Zh-bio avatar Nov 17 '21 20:11 Emily-Zh-bio

@Emily-Zh-bio We have this implemented for PCA via the tidy() method; you can check out the code here. We don't have that implemented for PLS, though. If you'd like, you could open an issue on the recipes repo about this as a new feature. I do believe the info is in there, if you dig around, something like prepped$steps[[your_step_number]]$res$sd so it should be doable!

juliasilge avatar Nov 19 '21 02:11 juliasilge

FYI - to get the PLS graphs to work correctly, I needed to add mixOmics. Otherwise, the chart came out looking like a correlation plot by individual features (not PLS1, PLS2...). code used:

if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("mixOmics")

Cheers!

tamcdevittbit avatar Feb 17 '22 20:02 tamcdevittbit

Thanks once again for this clear and useful usecase Julia! Do you think UMAP could be applicable when dealing with categorical data? So to summarize my issue, it's about using MCA rather than PCA with the same objective of reducing the dimensionality.

LDSTATXPERT avatar Apr 29 '22 07:04 LDSTATXPERT

@LDSTATXPERT I don't believe that UMAP handles categorical data natively, but you could try creating dummy/indicator variables and see how that goes.

juliasilge avatar Apr 30 '22 14:04 juliasilge