juliasilge.com icon indicating copy to clipboard operation
juliasilge.com copied to clipboard

PCA and the #TidyTuesday best hip hop songs ever | Julia Silge

Open utterances-bot opened this issue 3 years ago • 13 comments

PCA and the #TidyTuesday best hip hop songs ever | Julia Silge

Lately I’ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to tune more complex models. Today, I’m exploring a different part of the tidymodels framework; I’m showing how to implement principal component analysis via recipes with this week’s #TidyTuesday dataset on the best hip hop songs of all time as determinded by a BBC poll of music critics.

https://juliasilge.com/blog/best-hip-hop/

utterances-bot avatar Jul 12 '21 02:07 utterances-bot

Thank you so much for sharing your work; I am learning a lot from you! Just wanted to note that the link to the screencast isn't working :(, but was able to find the link on YouTube: https://www.youtube.com/watch?v=OvgzIx5mDNM for anyone looking for it :D.

gabbypaola avatar Dec 07 '21 22:12 gabbypaola

@gabbypaola Hmmm, that's strange. The video looks fine in the blog post to me. Can you tell me if there is anything unusual about your browser, OS, any firewalls/blocker, or similar? Do you see the embedded videos in the rest of my blog posts?

juliasilge avatar Dec 08 '21 03:12 juliasilge

Hi Julia, thanks for checking this out! I was able to get the video to work on this page. Looks like it was something on my browser settings.

gabbypaola avatar Dec 08 '21 15:12 gabbypaola

ranking_prep has 12 principal components, but juice(ranking_prep) only has 5. Does juice drop the less significant PC's based on a default threshold? Is there a way to change that threshold in juice(), or bake()?

sf210 avatar May 26 '22 01:05 sf210

@sf210 I don't believe that ranking_prep has 12 principal components; it has 12 predictors that are used in PCA. I used the default num_comp = 5 in the PCA extraction. If you would like more, you should change that argument.

juliasilge avatar May 26 '22 02:05 juliasilge

First big thank you for all your videos! I am learning so much and it is helping me. You put "points "as the outcome variable in recipe() step whereas in your other PCA videos (UN voting, cocktail recipes) you did not (also chapter 16 of your TMWR book you also put "class" as a outcome variable). It is as simple to say as when you don't put a outcome, it is "unsupervised" whereas if you put an outcome (eg. left of the ~) it is now "supervised"? If my understanding is correct, are there any sources to understand how "supervised" PCA compares to "unsupervised" PCA? Thanks!

alejandrohagan avatar May 29 '22 22:05 alejandrohagan

@alejandrohagan The actual PCA algorithm is always unsupervised and does not use the info from the outcome. When you use different kinds of formulas in a recipe, like points ~ . compared to ~ ., this is only about how the recipe understands the roles of the variables. When you use ~ ., the recipe treats all variables as predictors, with no outcomes at all.

If you are interested in an actual supervised dimensionality reduction approach, check out Ch 16 of our book.

juliasilge avatar May 30 '22 18:05 juliasilge

If you have not use the spotifyr package, you will need to do a few steps to replicate these results.

  1. Install the spotifyr package from github (it is not on CRAN currently):
    devtools::install_github('charlie86/spotifyr')
    
  2. Make a spotify developer account. You will be prompted to create an account if you go here.
  3. Follow the instructions to get the API details the package needs. They are here.
  4. Follow the instructions for setting the API details. They are here.

Do what Julia does...

RaymondBalise avatar Dec 14 '22 10:12 RaymondBalise

Hi Julia, thank you ever so much for your videos. I am following along your videos and learning so much. This one connecting to spotify is an extensive source of information, I am deeply grateful for this. You managed to teach me how to use maps, which I though it was an impossible task. I've been encountering a problem, though: if I try to do tidy(ranking_prep) it returns the following error:

Error in UseMethod("tidy") : 
  no applicable method for 'tidy' applied to an object of class "recipe"

Would you be able to explain me why?

acarpignani avatar Nov 29 '23 09:11 acarpignani

@acarpignani Oh, that does seem pretty strange. You should definitely be able to tidy() a recipe object. Are you able to run the examples in those docs?

juliasilge avatar Nov 29 '23 16:11 juliasilge

@juliasilge thank you ever so much for replying. For some reason, if I put library(spotifyr) before library(tidymodels) it works perfectly as usual, but if I only follow your step, it gives me the error I mentioned.

acarpignani avatar Nov 29 '23 16:11 acarpignani

@acarpignani Ah, that smells of a namespace conflict! You may want to read up on the conflicted package, and specifically consider using tidymodels_prefer().

juliasilge avatar Nov 29 '23 16:11 juliasilge

@juliasilge, thank you so much. Will certainly do it. Thank you for the advice, and thank you so very much for the videos you have made: I am following along every single one of them, and they are so informative and useful. I have learnt so much from you.

acarpignani avatar Nov 29 '23 16:11 acarpignani