Mike Stucka issues

Results 66 issues of


                                            Mike Stucka

Closing out several small problems

Closes out https://github.com/jdf/haikufinder/issues/4 -- Documentation doesn't say how to actually get NLTK working, or how to actually install this. Fixes a typo in the readme, also suggests the github repo...

Documentation needs a fix

It's missing the nltk data we need: pip install nltk python import nltk nltk.download('punkt') exit()

Blank space problems in init.py

A mix of tabs and spaces throws off Python 3.5. Started around line 76.

Small DC scraper problems

The scraper assumes a link for 2014 is listed but echoes a different year. For 2024's index, at least, that is not the case; no link to 2014 is offered,...

Document Node upgrade routines

@chriszs helped make me aware that documentation for Node upgrades in this repo are missing. This is some ad-hoc documentation copy-pasted from an issue for a private repo that's related...

Build automated QA checks

As @Kirkman found in #597 , a scraper can stop producing output without triggering an error in workflow. While a few states keep WARN and non-WARN layoffs in the same...

Discontinue old CA scraping

CA scraper is parsing PDFs from 2015, and not surprisingly is the slowest-running scraper of the bunch.

MO QA needed, with optimization potential

There may be an undocumented endpoint in Missouri that allows all years to be scraped on a single hit: https://jobs.mo.gov/warn/all This would need a modicum of testing to ensure we're...

GA scraper needs explicit timeouts

Probably most of the scrapers do, but: requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.tcsg.edu', port=443): Max retries exceeded with url: /warn-public-view/ (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 110] Connection timed out'))...

UTF-8 implicity doing bad stuff on Windows

#201 was caused by two different libraries making assumptions about how to save text files. This appears to have been developed on Macs and Linux, upon which I *think* Python...