warn-scraper icon indicating copy to clipboard operation
warn-scraper copied to clipboard

Add scrapes for all remaining states

Open zstumgoren opened this issue 4 years ago • 4 comments

For each area without a scraper we should make a ticket and do the following:

  1. Briefly write-up on whether multiple sources of data exist, and why one may be preferable over another
  2. Determine whether the site is part of a broader platform (e.g. #126)
  3. Propose a scraping strategy

The areas below do not currently have a scraper:

  • [ ] Arkansas
  • [x] Colorado #65
  • [x] Georgia #63
  • [x] Hawaii #371
  • [x] Idaho #82
  • [x] Illinois #81
  • [x] Kentucky #80
  • [x] Louisiana #79
  • [ ] Massachusetts #78
  • [x] Michigan #372
  • [ ] Minnesota #75
  • [ ] Mississippi #373
  • [ ] Nevada #237
  • [ ] New Hampshire
  • [x] New Mexico #73
  • [ ] North Carolina #74
  • [ ] North Dakota
  • [ ] Pennsylvania #374
  • [x] South Carolina #69
  • [x] Tennessee #192
  • [ ] West Virginia #375
  • [ ] Wyoming
  • [ ] American Samoa
  • [ ] Guam
  • [ ] Northern Mariana Islands
  • [ ] Puerto Rico
  • [ ] Virgin Islands

zstumgoren avatar Jun 24 '21 22:06 zstumgoren

Added issues for:

  • HI #371
  • MI #372
  • MS #373
  • PA #374
  • WV #375

Found one for:

  • NV #237

Ruled out scraping in the remainder of U.S. states based on prior research. Haven't tackled territories.

chriszs avatar Jan 22 '22 21:01 chriszs

Thanks @chriszs - I've added the tix you created/dug up to the main body of this Issue. Based on your prior work tackling WARN, should we flag remaining states (Arkansas, New Hampshire, North Dakota and Wyoming) as not offering data or as non-scraprable (i.e. they have no data on web but we could get it through a public records request)?

zstumgoren avatar Jan 22 '22 23:01 zstumgoren

Arkansas and Wyoming weren't obtainable via FOIA. North Dakota was. New Hampshire unclear. But worth revisiting just to be sure.

chriszs avatar Jan 23 '22 00:01 chriszs

A couple additional ways to think about completeness:

  • Population - E.g. notices in states accounting for 9X% of the population. This is I believe how we handled only having ~47 states in a story. This is why I started #425, #427 and reopened #358, since that's the next largest state with no coverage. By that metric, after comes #430 and #374.
  • Years - Looks like you can get historical data back to 2015 in most states without too much trouble. So, I've opened #431, #432, #433 and added a comment to #72.

The combination gives you seven years of data over states with the vast majority of people, which is pretty good and is only going to get better.

That is only somewhat leavened by Bloomberg's look at completeness compared to jobless claim stats, which found WARN (and the related state laws that seem to be accounting for some of the notices) only show a part of the overall picture in years with lots of layoffs (5% in a couple key states). GAO found a similar thing in an audit featured in the piece. Important caveat, but probably out of our control.

chriszs avatar Feb 21 '22 00:02 chriszs