shot-scraper
shot-scraper copied to clipboard
Mechanism for polling for new scraping updates without restarting Chromium
Had the idea in this Tweet
Yeah, if you want a high frequency you should absolutely run this on its own box
Might be value in supporting that directly in the tool, since it would save it having to launch a brand new Chromium instance each time if it stayed running
If you're scraping a frequently updating resource, it would be neat if you didn't have to instantiate an entirely new Chromium instance every time you ran the scraper.
Could look something like this:
shot-scraper javascript simonwillison.net --poll 30 -i scrape.js -o output.json
Catch here is that it's not particularly useful to over-write that file every 30 seconds if that just means that the data will be lost.
Since I plan mainly to use this with git scraping one solution could be to run something like this:
shot-scraper javascript simonwillison.net --poll 30 -i scrape.js -o output.json \
--cmd 'git commit -a -m "Updated" && git push'
The script would then execute the --cmd
after each scrape.