web-monitoring-db
web-monitoring-db copied to clipboard
An HTTP API for tracking and annotating changes to a set of web pages.
⚠️ **Work in progress!** ⚠️ This adds a rake command to export the contents of the DB into a SQLite file for public archiving. It's *mostly* a pretty straightforward copy...
This project has a nearly complete Ruby port of [the Internet Archive’s SURT Python package ](https://github.com/internetarchive/surt) buried in the [`app/lib/` directory](https://github.com/edgi-govdata-archiving/web-monitoring-db/tree/main/app/lib): https://github.com/edgi-govdata-archiving/web-monitoring-db/blob/3bb7e8a8960af75f7d05be86f58a88d055cfc79e/app/lib/surt.rb#L3-L20 I wrote it because we needed URL canonicalization...
You can get the changes related to a page at: ``` /api/v0/pages/{page ID}/changes ``` This should also be available at the top level: ``` /api/v0/changes ``` This requires: - Changes...
It would be nice to have a way to associated legacy Annotations, which I assume will be subjected to a lot of analysis, with Versions in our app. Somehow getting...
If an import results in the creation of a new page record, but fails when creating the actual *version* of that page (because of invalid data), the empty page record...
Listening to analyst discussions, sometimes there are particular terms or changes that it would be really helpful to search for other instances of (e.g. “state cooperation”). It would be great...
At the analyst meeting today, I mentioned that one of our next priorities after Wayback support and deployment fixes is revisiting annotations and actually making them usable. In the short...