shot-scraper icon indicating copy to clipboard operation
shot-scraper copied to clipboard

Mechanism for polling for new scraping updates without restarting Chromium

Open simonw opened this issue 2 years ago • 1 comments

Had the idea in this Tweet

Yeah, if you want a high frequency you should absolutely run this on its own box

Might be value in supporting that directly in the tool, since it would save it having to launch a brand new Chromium instance each time if it stayed running

If you're scraping a frequently updating resource, it would be neat if you didn't have to instantiate an entirely new Chromium instance every time you ran the scraper.

simonw avatar Mar 14 '22 02:03 simonw

Could look something like this:

shot-scraper javascript simonwillison.net --poll 30 -i scrape.js -o output.json

Catch here is that it's not particularly useful to over-write that file every 30 seconds if that just means that the data will be lost.

Since I plan mainly to use this with git scraping one solution could be to run something like this:

shot-scraper javascript simonwillison.net --poll 30 -i scrape.js -o output.json \
  --cmd 'git commit -a -m "Updated" && git push'

The script would then execute the --cmd after each scrape.

simonw avatar Mar 14 '22 02:03 simonw