web-monitoring-processing
web-monitoring-processing copied to clipboard
Tools for access, "diff"-ing, and analyzing archived web pages
Some pages have a `` element in their markup, indicating a correct, “canonical” URL for the page (some more info here: https://en.wikipedia.org/wiki/Canonical_link_element). When importing data from the Wayback Machine, it...
Long ago, we worked around an issue where we were getting lots of connection failures from Wayback with a dirty hack: if we ran out of retries but still had...
The Internet Archive import script(s) (`wm import ia` and `wm import ia-known-pages`) should have an option that causes them to upload Mementos to S3: ```sh $ wm import ia 'http://www.epa.gov/'...
The import script has gotten pretty crazy and messy over time, and we could remove a lot of the complexity. Some is just because it’s taken us a while to...
We will want to run tests against real lists of changes that were flagged for review. Some of the elements of these lists are already public because they are the...
As a first test of all the things needed to automatically rate a change’s significance, priority, let’s start with something simple that looks for changes that we can pretty confidently...