juliasilge.com
juliasilge.com copied to clipboard
PCA and the #TidyTuesday best hip hop songs ever | Julia Silge
PCA and the #TidyTuesday best hip hop songs ever | Julia Silge
Lately I’ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to tune more complex models. Today, I’m exploring a different part of the tidymodels framework; I’m showing how to implement principal component analysis via recipes with this week’s #TidyTuesday dataset on the best hip hop songs of all time as determinded by a BBC poll of music critics.
Thank you so much for sharing your work; I am learning a lot from you! Just wanted to note that the link to the screencast isn't working :(, but was able to find the link on YouTube: https://www.youtube.com/watch?v=OvgzIx5mDNM for anyone looking for it :D.
@gabbypaola Hmmm, that's strange. The video looks fine in the blog post to me. Can you tell me if there is anything unusual about your browser, OS, any firewalls/blocker, or similar? Do you see the embedded videos in the rest of my blog posts?
Hi Julia, thanks for checking this out! I was able to get the video to work on this page. Looks like it was something on my browser settings.
ranking_prep has 12 principal components, but juice(ranking_prep) only has 5. Does juice drop the less significant PC's based on a default threshold? Is there a way to change that threshold in juice(), or bake()?
@sf210 I don't believe that ranking_prep
has 12 principal components; it has 12 predictors that are used in PCA. I used the default num_comp = 5
in the PCA extraction. If you would like more, you should change that argument.
First big thank you for all your videos! I am learning so much and it is helping me. You put
"points "as the outcome variable in recipe()
step whereas in your other PCA videos (UN voting, cocktail recipes) you did not (also chapter 16 of your TMWR book you also put "class" as a outcome variable). It is as simple to say as when you don't put a outcome, it is "unsupervised" whereas if you put an outcome (eg. left of the ~) it is now "supervised"? If my understanding is correct, are there any sources to understand how "supervised" PCA compares to "unsupervised" PCA? Thanks!
@alejandrohagan The actual PCA algorithm is always unsupervised and does not use the info from the outcome. When you use different kinds of formulas in a recipe, like points ~ .
compared to ~ .
, this is only about how the recipe understands the roles of the variables. When you use ~ .
, the recipe treats all variables as predictors, with no outcomes at all.
If you are interested in an actual supervised dimensionality reduction approach, check out Ch 16 of our book.
If you have not use the spotifyr package, you will need to do a few steps to replicate these results.
- Install the spotifyr package from github (it is not on CRAN currently):
devtools::install_github('charlie86/spotifyr')
- Make a spotify developer account. You will be prompted to create an account if you go here.
- Follow the instructions to get the API details the package needs. They are here.
- Follow the instructions for setting the API details. They are here.
Do what Julia does...
Hi Julia, thank you ever so much for your videos. I am following along your videos and learning so much. This one connecting to spotify is an extensive source of information, I am deeply grateful for this. You managed to teach me how to use maps, which I though it was an impossible task. I've been encountering a problem, though: if I try to do tidy(ranking_prep)
it returns the following error:
Error in UseMethod("tidy") :
no applicable method for 'tidy' applied to an object of class "recipe"
Would you be able to explain me why?
@acarpignani Oh, that does seem pretty strange. You should definitely be able to tidy()
a recipe object. Are you able to run the examples in those docs?
@juliasilge thank you ever so much for replying. For some reason, if I put library(spotifyr)
before library(tidymodels)
it works perfectly as usual, but if I only follow your step, it gives me the error I mentioned.
@acarpignani Ah, that smells of a namespace conflict! You may want to read up on the conflicted package, and specifically consider using tidymodels_prefer()
.
@juliasilge, thank you so much. Will certainly do it. Thank you for the advice, and thank you so very much for the videos you have made: I am following along every single one of them, and they are so informative and useful. I have learnt so much from you.