warn-scraper icon indicating copy to clipboard operation
warn-scraper copied to clipboard

OH missing earliest years from PDF

Open stucka opened this issue 2 years ago • 0 comments

The Ohio scraper has been rebuilt and most of the archives were consolidated into a single CSV for download.

However, the CSV that Big Local News had been hosting contained badly parsed data from the PDFs of 2015 and 2016, containing a bunch of junk characters. We could use someone to parse out the two PDFs into a CSV format so we can get them added to our archival data.

The original PDFs are included in the ZIP, as is the then-consolidated snapshot of the CSV:

https://storage.googleapis.com/bln-data-public/warn-layoffs/oh_2015-2022.zip

The current scraper is grabbing 2017-2022 from a CSV similar to the one that's in the ZIP file here, other than the 2015, 2016, and 2023 data have been purged from it.

stucka avatar Sep 14 '23 15:09 stucka