Mike Stucka
Mike Stucka
You're using Python 3, and that's a Python 2 program. Use parenthesis: print(soup.prettify())
Triggering tests by closing and reopening.
OK, so for the record I've done some terrible things to @Ash1R 's draft, and hope to do more soon and get this into production. - Realized naming scheme for...
@Ash1R , I've got a bunch more validation in the scraper. I incorporated the fixes made by @jsvine but then had to go farther off the reservation to patch an...
Seeing some data integrity problems with edge cases that bump up against the logic of "every other row has the layoff number" kind of thing. A good example: https://mdes.ms.gov/media/26893/PY2011_Q1_WARN_July2011_Sep2011.pdf Another...
The PDF parsing is still failing in some interesting ways. I tried to get the historical data cleaned up but found most of a page missing, e.g., 152801_py2018_q4_warn_apr2019_jun2019.pdf I tweaked...
@dchiueh Shame on me! I had the same experience last week but never came back to say so. I did find some in-house attorneys who presumably could point us in...
Interestingly, well into 2023 that page was still active, and pointed at WARN notices only through early 2021: https://web.archive.org/web/20230428092026/https://www.ddec.pr.gov/images/WARN%20NOTICE%20LAW%20%202019-2020.pdf https://web.archive.org/web/20230428092029/https://www.ddec.pr.gov/images/WARN%20NOTICE%20LAW%20%202021.pdf But these sheets are in English, so it's possible the...
Thank you, @chriszs ! I wrote her July 10 and never heard back so my fingers are crossed.
The HTML parser trashes the `p` tags but I'm wondering if that might be contributing to some of the problems here? https://www.dli.pa.gov/Individuals/Workforce-Development/warn/notices/Pages/April-2020.aspx In April 2020, for example, I see the...