american-political-data-and-r
american-political-data-and-r copied to clipboard
A guide to analyzing & visualizing American political data using R. Winter 2023 Version.
American political data & R
Updated: 2020-11-09
An R-based guide to accessing, exploring & visualizing US political data via a collection of publicly available resources, including election returns for presidential and congressional races, political ideology scores for US lawmakers, and census-based characterizations of US congressional districts.
Election returns have been collated from Daily Kos, MIT Election Data and Science Lab and Wikipedia; the R packages Rvoteview & tidycensus are used extensively to characterize lawmakers/voting behavior and district demographics, respectively.
Hopefully a useful open source & transparent framework for investigating past & future election results and congresses using R. All work presented here can be reproduced in its entirety. A stand-alone html version of this guide can be downloaded here.
-
American political data & R
-
Quick preliminaries
- Some geo-spatial data
- A simple add-on map theme
- Some quick definitions
-
Data sources
- VoteView
- uspols
-
Historical presidential election
results
- Voting margins in Presidential elections since 1972
- When each state last voted for a Democratic presidential nominee
- Presidential elections and vote shares and crosses of gold
- Fall of the Blue Wall - 2016
-
Historical composition of the
Senate
- Split Senate delegations and shifting ideologies
- Split Senate delegations on the wane again
- US Senate delegations by party composition
- The end of split-ticket voting for now
- Republican Senators and a minority of Americans
-
Historical composition of the
House
- Political realignment in the South
- On the evolution of the Southern Republican
- A GIF
-
Four generations of American
lawmakers
- Trends in the average age of House members
- Shifting distributions maybe
- Watergate babies and vestiges of Obama
- Introducing Millenials and Gen Xers
-
Congressional districts and the
ACS
- Profiling congressional districts
- ACS variables and margins of victory
-
America’s White working class
- Race-work class distributions
- The White working class’ America
- White working class and rural America
- Fin
-
Quick preliminaries
Quick preliminaries
library(tidyverse)
Some geo-spatial data
State-based geo-data
library(sf)
library(tigris)
options(tigris_use_cache = TRUE, tigris_class = "sf")
nonx <- c('78', '69', '66', '72', '60', '15', '02')
states_sf <- tigris::states(cb = TRUE) %>%
rename(state_code = STATEFP, state_abbrev = STUSPS)
states <- states_sf %>%
data.frame() %>%
select(state_code, state_abbrev)
Congressional districts
uscds <- tigris::congressional_districts(cb = TRUE) %>%
select(GEOID) %>%
mutate(state_code = substr(GEOID, 1, 2),
district_code = substr(GEOID, 3, 4))
laea <- sf::st_crs("+proj=laea +lat_0=30 +lon_0=-95")
A simple add-on map theme
theme_guide <- function () {
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
legend.title=element_blank(),
legend.position = 'none',
complete = F) }
Some quick definitions
Per VoteView definition: The South = Dixie + Kentucky + Oklahoma
south <- c('SC', 'MS', 'FL',
'AL', 'GA', 'LA', 'TX',
'VA', 'AR', 'NC', 'TN',
'OK', 'KY')
swing <- c('AZ', 'FL', 'GA', 'MI',
'MN', 'NC', 'PA', 'WI')
states_sf %>%
filter(!state_code %in% nonx) %>%
mutate(south = ifelse(state_abbrev %in% south,
'south', 'not'),
swing = ifelse(state_abbrev %in% swing,
'swing', 'not')) %>%
select(state_abbrev, geometry, south, swing) %>%
gather(-state_abbrev, -geometry, key = 'type', value = 'val') %>%
mutate(label = ifelse(!grepl('not', val), state_abbrev, NA)) %>%
sf::st_transform(laea) %>%
ggplot() +
geom_sf(aes(fill = val),
color = 'white', size = .15,
alpha = 0.65) +
ggsflabel::geom_sf_text(aes(label = label),
size = 2.25,
color='black') +
scale_fill_manual(values = c('#8faabe',
'#1a476f',
'#55752f')) +
theme_minimal() +
theme_guide() +
theme(panel.background =
element_rect(fill = '#d5e4eb', color = NA)) +
facet_wrap(~type, ncol = 1)
Data sources
VoteView
The VoteView project provides roll call-based political ideology scores for all lawmakers in the history of the US Congress. The R package
Rvoteview
provides access to these data.
## NOTE: election years. term begins year + 1
ccr <- data.frame(year = c(1786 + 2*rep(c(1:116))),
congress = c(1:116))
con <- 63 #66
vvo <- lapply(c('house', 'senate'), function(x) {
Rvoteview::download_metadata(type = 'members',
chamber = x) %>%
filter(congress > con & chamber != 'President') }) #66
## [1] "/tmp/RtmptThkYl/Hall_members.csv"
## [1] "/tmp/RtmptThkYl/Sall_members.csv"
congress <- vvo %>%
bind_rows() %>%
mutate(x = length(unique(district_code))) %>%
ungroup() %>%
mutate(district_code = ifelse(x==1, 0, district_code)) %>%
mutate(district_code =
stringr::str_pad (as.numeric(district_code),
2, pad = 0),
southerner = ifelse(state_abbrev %in% south,
'South', 'Non-south'),
party_name = case_when (party_code == 100 ~ 'Democrat',
party_code == 200 ~ 'Republican',
!party_code %in% c(100, 200) ~ 'other')) %>%
left_join(ccr, by = 'congress')
uspols
The
uspols
package is my attempt at taming publicly available US election data. The package collates data from Daily Kos, MEDSL & Wikipedia in a uniform format. Importantly, package documentation details all data transformation processes from raw data to package table. So, if you take issue with a data point, check out the documentation and let me know. Why election return data are so nebulous from an accessibility standpoint is absolutely beyond me.
library(devtools)
devtools::install_github("jaytimm/uspols")
library(uspols)
Historical presidential election results
Voting margins in Presidential elections since 1972
Historical Presidential election results by state via Wikipedia. Equal-area state geometry via Daily Kos.
mp <- uspols::xsf_TileOutv10 %>%
left_join(uspols::uspols_wiki_pres %>%
filter(year > 1971) %>%
mutate(margins = republican - democrat))
mp %>%
ggplot() +
geom_sf(aes(fill = margins),
color = 'darkgray', lwd = .15) +
geom_sf(data = uspols::xsf_TileInv10,
fill = NA,
show.legend = F,
color = NA,
lwd=.5) +
ggsflabel::geom_sf_text(data = uspols::xsf_TileInv10,
aes(label = state_abbrev),
size = 1.5,
color='black') +
scale_fill_distiller(palette = "RdBu",
limit = max(abs(mp$margins)) * c(-1, 1)) +
facet_wrap(~year, ncol = 4) +
theme_minimal()+ theme_guide() +
labs(title = "Voting margins in Presidential elections since 1972")
When each state last voted for a Democratic presidential nominee
clean_prex <- uspols::uspols_wiki_pres %>%
mutate(winner = gsub('Franklin D. Roosevelt', 'FDR', winner),
winner = gsub('Lyndon B. Johnson', 'LBJ', winner),
winner = gsub('Hillary Clinton', 'HRC', winner))
last_dem <- clean_prex %>%
group_by(state_abbrev, party_win) %>%
filter(year == max(year),
party_win == 'democrat') %>%
ungroup() %>%
mutate(lab = paste0(year, ' - ', winner))
Nine US states have not voted for a Democratic Presidential candidate since LBJ.
new1 <- uspols::xsf_TileInv10 %>%
left_join(last_dem, by ='state_abbrev') %>%
mutate(label = paste0(state_abbrev,
'\n',
year,
'\n',
gsub('^.* ', '', winner)))
uspols::xsf_TileOutv10 %>%
left_join(last_dem, by ='state_abbrev') %>%
ggplot() +
geom_sf(aes(fill = winner),
color = 'white' ,
alpha = .65) +
ggsflabel::geom_sf_text(data = new1,
aes(label = new1$label),
size = 3,
color = 'black') +
theme_minimal() +
theme_guide() +
theme(legend.position = 'none') +
ggthemes::scale_fill_economist()+
labs(title = "When each state last voted for a Democratic presidential nominee")
Presidential elections and vote shares and crosses of gold
The Deep South’s collective enthusiasm for FDR in the 30’s has only been matched historically by the Mountain West’s enthusiasm for William Jennings Bryant in the election of 1896. Just to say.
vote_share <- clean_prex %>%
select(-party_win) %>%
gather(key = 'party', value = 'per', democrat:republican) %>%
group_by(state_abbrev) %>%
slice(which.max(per)) %>%
ungroup() %>%
mutate(label = paste0(year, ' - ', winner))
new <- uspols::xsf_TileInv10 %>%
left_join(vote_share, by ='state_abbrev') %>%
mutate(per = round(per, 1)) %>%
mutate(label = paste0(state_abbrev,
'\n',
year,
'\n',
gsub('^.* ', '', winner),
'\n',
per))
uspols::xsf_TileOutv10 %>%
left_join(vote_share, by ='state_abbrev') %>%
ggplot() +
geom_sf(aes(fill = winner),
color = 'white' ,
alpha = .65) +
ggsflabel::geom_sf_text(data = new,
aes(label = new$label),
size = 2.5,
color = 'black') +
scale_fill_manual(
values = colorRampPalette(ggthemes::economist_pal()(8))(14)) +
theme_minimal() + theme_guide() +
theme(legend.position = 'none') +
labs(title = "Largest vote share for Presidential nominee",
subtitle = "By state since 1864")
Fall of the Blue Wall - 2016
And the disdain of Utah.
counties <- tigris::counties(cb = TRUE) %>%
filter(!STATEFP %in% nonx) %>%
sf::st_transform(laea)
A county-level perspective: Trump margins in 2016 minus Romney margins in 2012.
deltas <- uspols::medsl_pres_county %>% ## address this
filter(!is.na(GEOID)) %>%
filter(year %in% c(2012, 2016)) %>%
mutate(year = paste0('X', year),
margins = republican - democrat) %>%
select(GEOID, year, margins) %>%
spread(year, margins) %>%
mutate(delta = X2016 - X2012)
counties %>%
left_join(deltas, by = 'GEOID') %>%
ggplot() +
geom_sf(aes(fill = delta),
color = 'gray',
size = .25) +
scale_fill_distiller(palette = "RdYlBu",
limit = max(abs(deltas$delta)) * c(-1, 1)) +
theme_minimal() +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
legend.position = 'bottom') +
labs(title = 'The Fall of the Blue Wall (ca. 2016)')
Historical composition of the Senate
Split Senate delegations and shifting ideologies
A Senate delegation for a given state is said to be split when comprised of Senators from different parties, eg, one Republican and one Democrat – as is the case with, eg, West Virginia in the (present) 116th Congress.
sens <- congress %>%
filter(chamber == 'Senate') %>%
mutate(party_name = as.factor(party_name)) %>%
mutate(party_name = forcats::fct_relevel(party_name,
'other',
after = 2)) %>%
mutate(year = year + 1) %>%
group_by(year, congress, state_abbrev) %>%
slice(1:2) %>%
ungroup()
sens2 <- sens %>%
filter(congress %in% c(68, 74, 80,
86, 92, 98,
104, 110, 116)) %>%
group_by(year, congress, state_abbrev) %>%
arrange (party_name) %>%
mutate(layer = row_number()) %>%
ungroup()
uspols::xsf_TileOutv10 %>%
left_join(sens2 %>% filter(layer == 2)) %>%
ggplot() +
geom_sf(aes(fill = party_name),
color = 'white',
lwd = 0.2,
alpha = .85) +
geom_sf(data = uspols::xsf_TileInv10 %>%
left_join(sens2 %>% filter (layer == 1)),
aes(fill = party_name),
color = 'white',
lwd = 0.2,
alpha = .7) +
ggsflabel::geom_sf_text(data = uspols::xsf_TileInv10,
aes(label = state_abbrev),
size = 1.55,
color = 'white') +
ggthemes::scale_fill_stata()+
theme_minimal() +
theme_guide() +
theme(legend.position = 'bottom') +
facet_wrap(~year + congress) +
labs(title = "Senate composition by state since 1923",
caption = 'Data sources: Daily Kos & VoteView')
Split Senate delegations on the wane again
split_senate <- sens %>%
group_by(year, congress, state_abbrev) %>%
summarize(splits = length(unique(party_name)),
parts = paste0(party_name, collapse = '-')) %>%
mutate(parts = ifelse(splits == 2,
'Split',
paste0('Both ',
gsub('-.*$', '', parts))))
split_senate$parts <- factor(split_senate$parts,
levels = c('Both Democrat',
'Split',
'Both Republican',
'Both other'))
split_senate %>%
filter(splits == 2) %>%
group_by(year, congress) %>%
summarize(n = n()) %>%
ggplot() +
geom_bar(aes(x = year,
y = n),
color = 'white',
fill = 'steelblue',
stat = 'identity') +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
axis.title.x=element_blank()) +
scale_x_continuous(breaks = seq(1915, 2019, 4)) +
labs(title = "Split Senate delegations since 1915")
US Senate delegations by party composition
split_pal <- c('#395f81', '#ead8c3', '#9e5055', '#b0bcc1')
split_senate %>%
group_by(year, congress, parts) %>%
summarize(n = n()) %>%
ggplot(aes(x = year,
y = n,
fill = parts))+
geom_bar(alpha = 0.85,
color = 'gray',
lwd = .25,
stat = 'identity') +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
theme(legend.position = "bottom",
legend.title=element_blank())+
scale_fill_manual(values = split_pal) +
scale_x_continuous(breaks = seq(1915, 2019, 4)) +
xlab('') +
ggtitle('US Senate delegations, by party composition')
The end of split-ticket voting for now
Split-ticket voting is when a voter casts votes for candidates of different parties across multiple offices in a given election. Split-ticket voting contrasts with straight-ticket voting. Here, we consider one instance of split-ticket voting: President-Senator splits at the state-level.
splits <- uspols::uspols_wiki_pres %>% ###
rename(party_pres = party_win) %>%
filter(year >= 1976) %>%
select(year, state_abbrev, party_pres) %>%
inner_join(uspols::uspols_medsl_senate, ###
by = c('year', 'state_abbrev')) %>%
mutate(split = ifelse(party_pres != party_win, 1, 0)) %>%
complete(year, state_abbrev)
#mutate(split = ifelse(is.na(split), 0, split))
year <- seq(from = 1976, to = 2016, by = 2)
class <- paste0('Class-', c(rep(1:3,7)))
df <- data.frame(year, class)
uspols::xsf_TileOutv10 %>%
left_join(splits, by = 'state_abbrev') %>%
left_join(df, by = 'year') %>%
filter(year > 1980) %>% # cleaner viz
mutate (split1 = case_when (split == 0 ~ 'Straight-ticket',
split == 1 ~ 'Split-ticket',
is.na(split) ~ 'No-contest')) %>%
ggplot() +
geom_sf(aes(fill = split1),
color = 'black',
lwd = .15,
alpha = 0.65) +
ggsflabel::geom_sf_text(data = uspols::xsf_TileInv10,
aes(label = state_abbrev),
size = 1.75,
color = 'black') +
scale_fill_manual(values = c('#8faabe', '#55752f',
'#dae2ba')) + #,
facet_wrap(~year + class) +
theme_minimal() +
theme_guide() +
theme(legend.position = 'bottom') +
labs(title = "Pres-Senate split-tickets per general election year")
Republican Senators and a minority of Americans
Yearly populations by state scraped from FRED Economic Data, and made avaliable as a table in the
sometables
package.
library(devtools)
devtools::install_github("jaytimm/sometables")
library(tables)
Senate seats held by Republicans via VoteView; population of states represented by Reublican senators via FRED.
wpops <- sens %>% #yy %>%
left_join(sometables::pop_us_states,
by = c('year', 'state_abbrev')) %>% #yy %>%
group_by(year, party_name) %>%
summarize(n = n(),
#n = sum(n),
pop = sum(pop)) %>%
group_by(year) %>%
mutate(Senate_share = round(n/sum(n) * 100, 1),
Population_share = round(pop/sum(pop) * 100, 1)) %>%
filter(party_name == 'Republican') %>%
select(year, Senate_share, Population_share) %>%
gather(-year, key = 'var', value = 'per')
Gray highlight: Congresses in which (1) GOP senators hold a majority in the Senate AND (2) a minority of Americans are represented by a Repbulican senator.
wpops %>%
ggplot() +
geom_rect(aes(xmin = 2015,
xmax = 2019,
ymin = -Inf,
ymax = Inf),
fill = 'lightgray') +
geom_hline(yintercept = 50, color = 'black', lwd = .2) +
geom_line(aes(x = year,
y = per,
color = var),
size = 1) +
ggthemes::scale_color_few()+
theme_minimal() +
theme(legend.position = 'bottom',
axis.text.x = element_text(angle = 90),
axis.title.x = element_blank(),
legend.title = element_blank()) +
scale_x_continuous(breaks = seq(min(wpops$year), max(wpops$year), 4)) +
labs(subtitle = "Republican Senate share v. Share Americans represented by Republican senator")
Historical composition of the House
congress_south <- congress %>%
filter(party_code %in% c(100, 200), chamber == 'House') %>%
mutate(Member = as.factor(paste0(party_name, ', ', southerner))) %>%
## re-factor
mutate(Member = forcats::fct_relevel(Member,
'Republican, Non-south',
after = 3))
Political realignment in the South
congress_south %>%
group_by(year, Member) %>%
summarize(n = n()) %>%
mutate(n = n/sum(n)) %>%
ggplot(aes(x = year+1,
y = n,
fill = Member)) +
geom_area(alpha = 0.65, color = 'gray') +
geom_hline(yintercept = 0.5, color = 'white', linetype = 2) +
scale_x_continuous(breaks=seq(min(congress_south$year+1),
max(congress_south$year+ 1), 4)) +
scale_fill_manual(values = c('#1a476f', '#8faabe',
'#e19463', '#913a40')) +
theme_minimal() +
theme(legend.position = 'top',
legend.title=element_blank(),
axis.title.y=element_blank(),
axis.title.x=element_blank(),
axis.text.y=element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title = "House composition since 1915")
On the evolution of the Southern Republican
DW-NOMINATE ideal points in two dimensions. The first dimension captures ideological variation based in the standard liberal-conservative divide. The second captures variation based in social conservatism that crosscuts political affiliation.
congress_south %>%
mutate(year = year + 1) %>%
filter (congress %in% c(84, 88, 92,
96, 100, 104,
108, 112, 116)) %>%
ggplot(aes(x = nominate_dim1,
y = nominate_dim2) ) +
annotate("path",
x=cos(seq(0,2*pi,length.out=300)),
y=sin(seq(0,2*pi,length.out=300)),
color='gray',
size = .25) +
geom_point(aes(color = Member),
size= 1.25,
shape = 17) +
scale_color_manual(values = c('#1a476f', '#8faabe',
'#e19463', '#913a40')) +
facet_wrap(~year + congress) +
theme_minimal() +
theme(legend.title=element_blank(),
legend.position = 'bottom') +
labs(title="The evolution of the Southern Republican",
subtitle = 'In two dimensions: from 1955 to 2019')
A GIF
anim <- congress_south %>%
ggplot(aes(x = nokken_poole_dim1,
y = nokken_poole_dim2) ) +
annotate("path",
x=cos(seq(0,2*pi,length.out=300)),
y=sin(seq(0,2*pi,length.out=300)),
color='gray',
size = .25) +
geom_point(aes(color = Member),
size= 2.25,
shape = 17) +
scale_color_manual(values = c('#1a476f', '#8faabe',
'#e19463', '#913a40')) +
theme_minimal() +
theme(legend.title=element_blank(),
legend.position = 'bottom') +
labs(title = "Congress: {frame_time}") +
gganimate::transition_time(as.integer(congress)) +
gganimate::ease_aes('linear')
gganimate::animate(anim,
fps = 4,
nframes = 50,
height = 400,
width = 550)
# gganimate::anim_save("ideologies.gif")
Four generations of American lawmakers
Pew Research generations:
- Millenials: 1981-1997
- Generation X: 1965 -1980
- Baby Boomers: 1946-1964
- Silent: 1928-1945
- Greatest: < 1928
Trends in the average age of House members
congress %>%
mutate(age = year - born) %>%
filter (party_code %in% c('100', '200'), year > 1960) %>%
group_by(party_name, year) %>%
summarize(age = round(mean(age, na.rm = T), 1)) %>%
mutate(label = if_else(year == max(year) | year == min(year),
age, NULL)) %>%
ggplot() +
geom_line(aes(x = year + 1,
y = age,
color = party_name),
size = .8) +
ggrepel::geom_text_repel(aes(x = year + 1,
y = age,
label = label),
size= 3.25,
nudge_x = 1,
na.rm = TRUE) +
ggthemes::scale_color_stata()+
theme_minimal() +
theme(legend.position = 'none',
axis.text.x = element_text(angle = 90, hjust = 1),
axis.title.x=element_blank()) +
scale_x_continuous(breaks=seq(1963, 2019, 2)) +
labs(title = "Average age of congress members by party")
Shifting distributions maybe
congress %>%
mutate(age = year - born,
year = year + 1) %>%
filter (party_code %in% c('100', '200'), year > 2007) %>%
## 100 == democrat --
ggplot(aes(age,
fill = party_name)) +
ggthemes::scale_fill_stata()+
theme_minimal() +
theme(legend.position = "none",
axis.title.y=element_blank(),
axis.text.y=element_blank()) +
geom_density(alpha = 0.6, color = 'gray') +
facet_wrap(~year, nrow = 2)+
labs(title="Age distributions in the House since 2009, by party")
Watergate babies and vestiges of Obama
freshmen1 <- congress %>%
group_by(icpsr, bioname, party_name) %>%
summarize(min = min(year),
max = max(year)) %>%
group_by(min, party_name) %>%
summarise(count = n()) %>%
ungroup() %>%
filter(min > 1960, party_name != 'other')
labs <- freshmen1 %>%
arrange(desc(min)) %>%
top_n(4, count) %>%
mutate(txt = c('Obama 1st midterm',
'Clinton 1st midterm',
'"Watergate babies"',
'LBJ atop ticket'))
freshmen1 %>%
ggplot() +
geom_line(aes(x = min + 1,
y = count,
color = party_name),
size = 0.8) +
geom_text(data = labs,
aes(x = min,
y = count,
label = txt),
size = 3, nudge_y = 3) +
ggthemes::scale_color_stata()+
theme_minimal() +
theme(legend.position = 'none',
axis.title.x=element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_x_continuous(breaks=seq(1963,2019,2)) +
labs(title = "Freshman House members by party")
Introducing Millenials and Gen Xers
freshmen <- congress %>%
group_by(icpsr, bioname) %>%
mutate(n = length(congress)) %>%
ungroup() %>%
filter(congress == 116) %>% # only correct here -- not older congresses
mutate (Class = case_when (n == 1 ~ 'Freshmen',
n == 2 ~ 'Sophmores',
n > 2 ~ 'Upper-class')) %>%
select(icpsr, party_code, Class)
gens <- sometables::pew_generations %>%
mutate(age = 2020 - end_yr) %>%
filter(order %in% c(2:5))
congress %>%
filter (party_code %in% c('100', '200'),
congress == 116) %>%
mutate(age = year - born,
party_code = ifelse(party_code == '100',
'House Democrats',
'House Republicans')) %>%
left_join(freshmen %>% select(-party_code), by = "icpsr") %>%
## 100 == democrat --
ggplot() +
geom_dotplot(aes(x = age,
color = Class,
fill = Class),
method="histodot",
dotsize = .9,
binpositions = 'all',
stackratio = 1.3,
stackgroups=TRUE,
binwidth = 1) +
geom_vline(xintercept =gens$age - 0.5,
linetype =2,
color = 'black',
size = .25) +
geom_text(data = gens,
aes(x = age + 2.25,
y = 0.95,
label = generation),
size = 3) +
theme_minimal() +
ggthemes::scale_fill_economist() +
ggthemes::scale_color_economist() +
facet_wrap(~party_code, nrow = 2) +
theme(legend.position = "bottom",
axis.title.y=element_blank(),
axis.text.y=element_blank()) +
#ylim (0, .5) +
labs(title = "Age distribution of the 116th House by party")
Congressional districts and the ACS
The US Census/American Community Survey (ACS) make counts/estimates available by congressional district. The R package
tidycensus
provides very clean access to census APIs. Via super convenient ACS Data Profiles.
variable_list <- c(bachelors_higher = 'DP02_0068P',
foreign_born = 'DP02_0093P',
hispanic = 'DP05_0071P',
median_income = 'DP03_0062',
unemployed = 'DP03_0005P',
white = 'DP05_0077P',
black = 'DP05_0078P',
over_65 = 'DP05_0024P',
hs_higher = 'DP02_0067P',
non_english_home = 'DP02_0113P',
computer_home = 'DP02_0152P',
internet_home = 'DP02_0153P')
gen <- tidycensus::get_acs(geography = 'congressional district',
variables = variable_list,
year = 2019,
survey = 'acs1',
geometry = F) %>%
mutate(state_code = substr(GEOID, 1, 2),
district_code = substr(GEOID, 3, 4)) %>%
left_join(states, by = c('state_code')) %>%
select(state_abbrev, district_code, variable, estimate, moe)
Profiling congressional districts
Density plots for 12 ACS variables per 435 US congressional districts. Details for New Mexico’s 2nd district summarized in plot below as dashed lines.
base_viz <- gen %>%
ggplot( aes(estimate, fill = variable)) +
geom_density(alpha = 0.65,
color = 'darkgray',
adjust = 2.5) +
scale_fill_manual(
values = colorRampPalette(ggthemes::economist_pal()(6))(12)) +
facet_wrap(~variable, scale = 'free', ncol = 4)+
theme_minimal() +
theme(legend.position = "none",
axis.text.y=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank())
nm02 <- gen %>%
filter(state_abbrev == 'NM' & district_code == '02')
base_viz +
geom_vline (data = nm02,
aes(xintercept=estimate),
linetype = 2) +
labs(title = "A 2019 demographic profile",
subtitle = "New Mexico's 2nd District")
ACS variables and margins of victory
A multi-panel summary of relationships between ACS variables and 2016 Trump margins.
gen %>%
left_join(uspols::uspols_dk_pres,
by = c("state_abbrev", "district_code")) %>%
filter(year == 2016) %>%
mutate(Trump_margins = republican - democrat) %>%
ggplot(aes(y = Trump_margins,
x = estimate,
color = variable))+
geom_point(size =1) + #
geom_smooth(method="loess", se=T, color = 'black', linetype = 3) +
scale_color_manual( values = colorRampPalette(ggthemes::economist_pal()(6))(12)) +
theme_minimal() +
theme(legend.position = "none")+
facet_wrap(~variable, scales = 'free_x') +
labs(title = "2019 ACS estimates vs. 2016 Trump margins")
America’s White working class
White working class formalized in US Census terms: Population 25 years & older who (1) identify as both White & non-Hispanic and (2) have not obtained a Bachelor’s degree (or higher). Per Table C15002: Sex by educational attainment for the population 25 years and over.
Note: In previous version of this document, I did not correctly count the White working class.
white_ed_vars <- c(white_m_bach = 'C15002H_006',
white_w_bach = 'C15002H_011',
white_pop = 'C15002H_001',
all_w_xbach = 'C15002_016',
all_w_xgrad = 'C15002_017',
all_m_xbach = 'C15002_008',
all_m_xgrad = 'C15002_009',
all_pop = 'C15002_001')
Categories include:
- White & non-Hispanic with college degree,
- White & non-Hispanic without college degree,
- Non-White and/or Hispanic with college degree, and
- Non-White and/or Hispanic college degree.
Note: The “and/or Hispanic” piece is slightly confusing here. For most people, Hispanic = Brown = Race; from this perspective, ethnicity (as distinct from race) is not a meaningful distinction. We include it here because it is included in the census.
white_ed <- tidycensus::get_acs(geography = 'congressional district',
variables = white_ed_vars,
year = 2019,
survey = 'acs1') %>%
select(-moe) %>%
spread(key = variable, value = estimate) %>%
mutate(white_college =
white_m_bach +
white_w_bach,
white_working =
white_pop -
white_college,
non_white_college =
all_m_xbach +
all_w_xbach +
all_m_xgrad +
all_w_xgrad -
white_college,
non_white_working =
all_pop -
white_pop -
non_white_college) %>%
select(GEOID, white_college:non_white_working) %>%
gather(-GEOID, key = 'group', value = 'estimate') %>%
group_by(GEOID) %>%
mutate(per = round(estimate/ sum(estimate) * 100, 1)) %>%
ungroup() %>%
mutate(state_code = substr(GEOID, 1, 2),
district_code = substr(GEOID, 3, 4)) %>%
left_join(states, by = c('state_code')) %>%
select(GEOID, state_code, state_abbrev,
district_code, group, per, estimate)
Race-work class distributions
set.seed(99)
samp_n <- sample(unique(white_ed $GEOID), 12)
white_ed %>%
filter(GEOID %in% samp_n) %>%
ggplot(aes(area = per,
fill = group,
label = gsub('_', '-', toupper(group)),
subgroup = group))+
treemapify::geom_treemap(alpha=.65)+
treemapify::geom_treemap_subgroup_border(color = 'white') +
treemapify::geom_treemap_text(colour = "black",
place = "topleft",
reflow = T,
size = 8) +
scale_fill_manual(values = c('#8faabe', '#1a476f',
'#dae2ba', '#55752f')) +
theme_minimal() +
facet_wrap(~paste0(state_abbrev, '-', district_code)) +
theme(legend.position = "none") +
labs(title = "Race-work class distributions",
caption = 'Source: ACS 1-Year estimates, 2019, Table C15002')
The White working class’ America
A state-based map in pie charts. Race-work class counts for states based on aggregate of congressional district counts.
white_ed2_state <- white_ed %>%
group_by(state_abbrev, group) %>%
summarise(estimate = sum(estimate)) %>%
mutate(per = round(estimate/sum(estimate) * 100, 1)) %>%
ungroup()
Calculate centroids for the equal-area state geometry.
cents <- sf::st_centroid(uspols::xsf_TileOutv10) %>%
sf::st_coordinates() %>%
data.frame() %>%
bind_cols(state_abbrev = uspols::xsf_TileOutv10$state_abbrev,
FIPS = uspols::xsf_TileOutv10$FIPS) %>%
left_join(white_ed2_state %>%
select(-estimate) %>%
spread(group, per))
ggplot() +
geom_sf(data = uspols::xsf_TileOutv10,
color = 'gray') +
scatterpie::geom_scatterpie(data = cents,
aes(x = X,
y = Y,
group = state_abbrev,
r = .17),
cols = colnames(cents)[5:8],
color = 'white',
alpha = 0.75) +
ggsflabel::geom_sf_text(data = uspols::xsf_TileInv10,
aes(label = state_abbrev),
size = 3,
color='black') +
scale_fill_manual(values = c('#8faabe', '#1a476f',
'#dae2ba', '#55752f')) +
theme_minimal()+
theme(axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
legend.position = 'bottom',
legend.title=element_blank()) +
labs(title = "Race-work class distributions in America",
caption = 'Source: ACS 1-Year estimates, 2019, Table C15002')
White working class and rural America
Degree of “rurality” operationalized as the size/geographic-area of a given congressional district (in log sq meters).
uscds <- uscds %>%
mutate(CD_AREA = round(log(as.numeric(
gsub(' m^2]', '', sf::st_area(.)))), 2))
bp <- white_ed %>%
select(GEOID, state_abbrev, district_code, group, per) %>%
filter(group == 'white_working') %>%
left_join(uscds %>% select(-district_code), by = 'GEOID') %>%
left_join(uspols::uspols_dk_pres %>% filter(year == 2016),
by = c("state_abbrev", "district_code")) %>%
filter(CD_AREA < 27) %>%
mutate(GEO = ifelse(state_abbrev %in% south,
'Southern CD', 'Non-southern CD'),
Trump_margins = republican - democrat) %>%
filter(!is.na(Trump_margins))
bp %>%
ggplot(aes(y = per,
x = CD_AREA,
shape = GEO,
color = Trump_margins))+
geom_point(size = 2.75) +
scale_color_distiller(palette = "RdYlBu",
limit = max(abs(bp$Trump_margins)) * c(-1, 1)) +
#theme_minimal() +
xlab('Congressional district area (log sq meters)') +
ylab('% White working class') +
theme(legend.position = "bottom") +
labs(title = "Degree of rurality ~ % White working ~ 2016 Trump margins")