assemble
assemble copied to clipboard
tokenize and analyze 2016 presidential candidate rhetoric for comparison with extremist communities
Anyone interested in doing some basic word/n-gram analysis, topic models, etc. on presidential candidate speeches and press releases? Would be really interesting to see which candidates were/weren't plugged in to the extremist communities and when/where certain extremist language creeps into more mainstream campaign discourse.
An R notebook with instructions and code for obtaining this data from The American Presidency Project will be in the exploratory_notebooks folder soon (just submitted a pull request).
Some examples of what's possible are in my personal GitHub repo.
FWIW, this should be a beginner-friendly project, but also open to more advanced algorithmic analysis.
I'm interested in learning R and think this is an interesting project.
@justinstimatze Excellent! I was able to scrape all of the GOP speeches, press releases, and campaign statements from January 2015 on and assemble into a single CSV, if that helps you explore: https://github.com/kshaffer/presidencyproject/blob/master/data/gop_2016_candidate_docs.csv
And if you're using this project to learn R, I highly recommend Tidy Text Mining. It's a free ebook explaining tools that might be helpful for this analysis.
This seems like an interesting project. Is it possible for me to join in on this project?
Hi Kshaffer,
I would like to join this project. I will be working on Pyhton. Is it possible for me to join this project?
@princeatul Thanks for your interest. All the things @kshaffer mentioned should be doable in python as well. If you're interested I would suggest grabbing the data linked above and try tackle one task from the list in the original post. Ex topic modeling
once you have a preliminary jupyter notebook open a PR to add it to the exploratory_notebooks section of this repo. I'm not an expert in this area but I do have this tutorial in my backlog that may help you get started.
If you need any help just visit us in #assemble channel on slack or post back here with any questions.
I don't see it discussed above, so I'll mention that FiveThirtyEight had a very interesting article on using latent semantic analysis for topic modeling reddit groups. Certainly an interesting starting point for those interested in seeing what might be done here.