assemble icon indicating copy to clipboard operation
assemble copied to clipboard

tokenize and analyze 2016 presidential candidate rhetoric for comparison with extremist communities

Open kshaffer opened this issue 7 years ago • 7 comments

Anyone interested in doing some basic word/n-gram analysis, topic models, etc. on presidential candidate speeches and press releases? Would be really interesting to see which candidates were/weren't plugged in to the extremist communities and when/where certain extremist language creeps into more mainstream campaign discourse.

An R notebook with instructions and code for obtaining this data from The American Presidency Project will be in the exploratory_notebooks folder soon (just submitted a pull request).

kshaffer avatar Mar 16 '17 15:03 kshaffer

Some examples of what's possible are in my personal GitHub repo.

FWIW, this should be a beginner-friendly project, but also open to more advanced algorithmic analysis.

kshaffer avatar Mar 16 '17 16:03 kshaffer

I'm interested in learning R and think this is an interesting project.

justinstimatze avatar Mar 20 '17 14:03 justinstimatze

@justinstimatze Excellent! I was able to scrape all of the GOP speeches, press releases, and campaign statements from January 2015 on and assemble into a single CSV, if that helps you explore: https://github.com/kshaffer/presidencyproject/blob/master/data/gop_2016_candidate_docs.csv

And if you're using this project to learn R, I highly recommend Tidy Text Mining. It's a free ebook explaining tools that might be helpful for this analysis.

kshaffer avatar Mar 20 '17 15:03 kshaffer

This seems like an interesting project. Is it possible for me to join in on this project?

ghost avatar Mar 21 '17 14:03 ghost

Hi Kshaffer,

I would like to join this project. I will be working on Pyhton. Is it possible for me to join this project?

princeatul avatar Mar 28 '17 15:03 princeatul

@princeatul Thanks for your interest. All the things @kshaffer mentioned should be doable in python as well. If you're interested I would suggest grabbing the data linked above and try tackle one task from the list in the original post. Ex topic modeling once you have a preliminary jupyter notebook open a PR to add it to the exploratory_notebooks section of this repo. I'm not an expert in this area but I do have this tutorial in my backlog that may help you get started.

If you need any help just visit us in #assemble channel on slack or post back here with any questions.

bstarling avatar Mar 29 '17 19:03 bstarling

I don't see it discussed above, so I'll mention that FiveThirtyEight had a very interesting article on using latent semantic analysis for topic modeling reddit groups. Certainly an interesting starting point for those interested in seeing what might be done here.

mw0 avatar Apr 30 '17 23:04 mw0