Mike Stucka
Mike Stucka
At least one more data point that the scraper is working through Github Actions. I'm going to close this now, but please reopen it if you see a problem. I...
@Kirkman has confirmed in another venue he continues to get blocked from two IP addresses, and I got blocked from one of two addresses I tested. Reopening.
I've seen some intermittent fails but it's generally working. The idea of working with Google's cache files for a suboptimal workaround is no longer even suboptimal, as that's no longer...
Endpoint shows 49,397 layoffs from 2019. BLN Missouri file (which may include things not scraped) shows 72,761 total, per Excel. This is a great opportunity for some extra QA!
QA needed. BLN version seems to show 364 entries, including combined rows for at least some of the revision entries. /all endpoint seems to show 327 entries with separate rows...
Flagging @kirkman instead of the other person I flagged by accident. I need sleep.
Lotsa duplicates for some reason in the BLN data. If I drop the obvious duplicates I get back to 52,379 layoffs among 256 entries, so it's close to the state's...
To clarify: "It may not affect anything to permanently add encoding="utf-8" to these file operations, but I'd want to test that better first." -- I'd meant on Linux and Macs....
Looks like the encoding default is indeed UTF-8 on Linux and Mac, but cp1252 on Windows. Fixing the libraries would be easy. https://peps.python.org/pep-0686/
... or should that be part of the setup.py install process?