warn-scraper icon indicating copy to clipboard operation
warn-scraper copied to clipboard

Command-line interface for downloading WARN Act notices of qualified plant closings and mass layoffs from state government websites

Results 64 warn-scraper issues
Sort by recently updated
recently updated
newest added

I have this working and will put up a PR shortly. The PDFs for NY are all well-structured, and this information is fairly easy to pull out with regexes.

Bumps [pillow](https://github.com/python-pillow/Pillow) from 10.2.0 to 10.3.0. Release notes Sourced from pillow's releases. 10.3.0 https://pillow.readthedocs.io/en/stable/releasenotes/10.3.0.html Changes CVE-2024-28219: Use strncpy to avoid buffer overflow #7928 [@​hugovk] Use functools.lru_cache for hopper() #7912 [@​hugovk]...

dependencies

Back when this was started, there seemed to be a dearth of data about layoffs. Today, there are several websites with layoff data, some automated with "AI," some based on...

> See the [Job Center docs](https://github.com/biglocalnews/WARN/docs/job_center.md) for background on the scraping strategy and issues described below. After cutting over to use the Job Center site class for AZ, DE, KS...

data quality

Idaho moved its warn PDF from `https://www.labor.idaho.gov/dnn/Portals/0/Publications/WARNNotice.pdf` to `https://www.labor.idaho.gov/wp-content/uploads/publications/WARNNotice.pdf`. The scraper follows this transparently, so there's no breakage, but seems like a good policy to update the URL to reflect...

One of the project's dependencies, `tenacity`, handles retrying, but only for one state: Florida. Most retries are handled by the `retry` package as called by `utils.get_url`. In the interest of...

This removes some code that is commented out, some that doesn't function anymore because there is no link for 2014. Partially addresses #633