web-monitoring-processing
web-monitoring-processing copied to clipboard
Allow limiting imports to one version per day
trafficstars
Some pages get captured a lot by the Internet Archive, and it’s not really necessary or valuable for us to import and track every one of those captures. Now you can set --skip-unchanged day to import at most one version per day (more-or-less; there are some cases where we might wind up importing more).
I’ve been using this when loading historical data for new URLs we track, but am not using it for our regular nightly imports of all URLs.