election-transparency icon indicating copy to clipboard operation
election-transparency copied to clipboard

Build a simple model of the 2016 presidential election

Open chrisdick14 opened this issue 8 years ago • 13 comments

We have a lot of data put together about the 2016 election, let's build some simple models trying to explain the outcomes so that we can discuss further.

Either of these notebooks would be a good start (if you are working in R):

https://github.com/Data4Democracy/election-transparency/blob/master/notebooks/r-notebooks/model_2016_presresults.Rmd

or

https://github.com/Data4Democracy/election-transparency/blob/master/notebooks/r-notebooks/basic-viz.Rmd

chrisdick14 avatar Feb 13 '17 22:02 chrisdick14

Happy to keep working on this.

jenniferthompson avatar Feb 14 '17 03:02 jenniferthompson

Great @jenniferthompson! This is a big one, so others are definitely welcome, want to get several different ideas down and start comparing.

chrisdick14 avatar Feb 14 '17 14:02 chrisdick14

I will jump in too. Will be using python though.

pghosh avatar Feb 15 '17 05:02 pghosh

Great @pghosh! Glad to see some python in the group!

chrisdick14 avatar Feb 15 '17 12:02 chrisdick14

I'd like to jump in and do some experimenting too. Are we looking at all types of models, sticking with more simple models like logistic, or just seeing what works best?

Are there basic goodness of fit stats that we are going to look at to compare? AIC, BIC, adjusted r-squared?

chrispelkey avatar Feb 16 '17 01:02 chrispelkey

@chrispelkey, I went with basic just to get people started. I would say we can start looking at all types of models. Also, take a look here near the bottom of the document to get some further ideas of what we were thinking about modeling and trying to answer. At the end of the day, we are trying to get a good model fit, but some of the most interesting information (at least for me) is going to be what are the strong predictors of a Trump or Hillary win in a county, and where do we mis-classify?

As for model comparison I am of the mind that you usually need to use more than one measure to explain the pros and cons of each methods fit. But, that is just my view of model selection, so it is always up for discussion.

chrisdick14 avatar Feb 16 '17 12:02 chrisdick14

+1 to @chrisdick14. My plan was to start with the predictors listed here, do a redundancy analysis and start with a logistic model for whether each county "picked the winner."

jenniferthompson avatar Feb 16 '17 16:02 jenniferthompson

I'd be happy to work on this for a bit. I have been kind of out of the loop the past couple weeks, but I will try to get some work done on this on the coming weekend.

fhollenbach avatar Feb 16 '17 16:02 fhollenbach

Great! Thanks @fhollenbach! Impossible to have too many of us working on this.

chrisdick14 avatar Feb 17 '17 02:02 chrisdick14

I did a first stab at building three simple logistic models. I also don't really know how to work GitHub (this is my first project), so I'm not sure if I uploaded the right way.

chrispelkey avatar Feb 19 '17 21:02 chrispelkey

@chrispelkey, did you submit a Pull Request for this?

chrisdick14 avatar Feb 23 '17 14:02 chrisdick14

@chrisdick14 I think I may have just done it right, finally...

chrispelkey avatar Feb 23 '17 15:02 chrispelkey

Yep, I see a PR now. One of us will look at this today/tonight!

chrisdick14 avatar Feb 23 '17 15:02 chrisdick14