warn-scraper
warn-scraper copied to clipboard
Command-line interface for downloading WARN Act notices of qualified plant closings and mass layoffs from state government websites
I have this working and will put up a PR shortly. The PDFs for NY are all well-structured, and this information is fairly easy to pull out with regexes.
Bumps [pillow](https://github.com/python-pillow/Pillow) from 10.2.0 to 10.3.0. Release notes Sourced from pillow's releases. 10.3.0 https://pillow.readthedocs.io/en/stable/releasenotes/10.3.0.html Changes CVE-2024-28219: Use strncpy to avoid buffer overflow #7928 [@hugovk] Use functools.lru_cache for hopper() #7912 [@hugovk]...
Back when this was started, there seemed to be a dearth of data about layoffs. Today, there are several websites with layoff data, some automated with "AI," some based on...
> See the [Job Center docs](https://github.com/biglocalnews/WARN/docs/job_center.md) for background on the scraping strategy and issues described below. After cutting over to use the Job Center site class for AZ, DE, KS...
Closes #642
Closes #644
Idaho moved its warn PDF from `https://www.labor.idaho.gov/dnn/Portals/0/Publications/WARNNotice.pdf` to `https://www.labor.idaho.gov/wp-content/uploads/publications/WARNNotice.pdf`. The scraper follows this transparently, so there's no breakage, but seems like a good policy to update the URL to reflect...
One of the project's dependencies, `tenacity`, handles retrying, but only for one state: Florida. Most retries are handled by the `retry` package as called by `utils.get_url`. In the interest of...
This removes some code that is commented out, some that doesn't function anymore because there is no link for 2014. Partially addresses #633