warn-scraper icon indicating copy to clipboard operation
warn-scraper copied to clipboard

Parse data from NY's historical PDFs

Open jsvine opened this issue 2 years ago • 3 comments

Responding to the call-out here: https://github.com/biglocalnews/warn-scraper/issues/476

This being my first commit to the project, and not knowing how the maintainers would like to handle the overlap between the data sources, I tried to take the least destructive approach.

And I think I obeyed the Contribution guidelines, but don't hesitate to holler if I haven't.

Also, in order to get the pre-commit hook to pass, I had to upgrade two dependencies in .pre-commit-config.yaml, due to changes in psf/black. (Cf.: https://github.com/psf/black/pull/2966 and https://github.com/asottile/blacken-docs/issues/141.) Let me know if there's another way you'd like that handled.

jsvine avatar May 17 '22 00:05 jsvine

Added a commit addressing your (very reasonable) requests 👍

jsvine avatar May 25 '22 22:05 jsvine

Great. Thank you. Do we have any overlap and duplication between the PDFs and our other data sources?

palewire avatar May 25 '22 23:05 palewire

Looks like there's quite a bit of overlap, both in actual data and timeframes. I don't know enough, however, about the history/provenance of the non-PDF data files to suggest which source(s) should take precedence.

jsvine avatar May 26 '22 18:05 jsvine