scrapers
scrapers copied to clipboard

Published 20 hours ago •

Police-Data-Accessibility-Project

Reame
Issues

Add scrapers style requirements to readme / templates

Open josh-chamberlain opened this issue 3 years ago • 0 comments

The task:

[ ] Represent these requirements in the scrapers readme or template as appropriate
[ ] Represent them by creating an example scraper that meets the criteria

Good scrapers:

Scraper must be able to pick up where it left off, i.e., not a complete grab each time, only the differences since the last run.
Scraper saves file to our Hadoop.
Scraper saves metadata to our database (Dolt or PostgreSQL)
Scraper to produce a SHA256 and MD5 hash for every file it generates and record it in database. A separate script can be used for this. Workflow would be something like scraper>extractor>saver

Questions:

Where would they save the keys? Keys or Developer API tokens, similar to those Github or other cloud services uses can be stored in config file of the individual scraper.
Does the script have to generate its own key? We generate them on the server and assign to scrapers.
Do all the scrapers just use a common key that is located on the scraping server? Each scraper will have its own.

Jun 07 '21 13:06 josh-chamberlain