web-monitoring-db
web-monitoring-db copied to clipboard
Ingest the legacy uuids from the old versionista outputter
It would be nice to have a way to associated legacy Annotations, which I assume will be subjected to a lot of analysis, with Versions in our app. Somehow getting the uuids generated by the old outputter (and now stored only in Google Sheets, I think) sounds slightly painful but possible and useful.
Some notes for this:
Since rows in those spreadsheets are really versions, this is should probably be matching Versionista version IDs. We can extract them from:
- In the spreadsheets: the
Last Two - Side by Side
column is alwayshttps://versionista.com/{site}/{page}/{version_id}:0
(version_id
is universally unique within Versionista, so all the other fields can be ignored) - In the DB:
version_record.source_metadata. version_id
The naive way someone could do this now would be to page through all the results of https://web-monitoring-db.herokuapp.com/api/v0/versions?source_type=versionista
If we wanted to better support this, we could:
- Add some indexing on
source_metadata.versionista_id
and allow querying by that field or - Make public DB exports available (#45)
Alternatively, a different, probably easier approach might be to create an API endpoint for uploading analyst annotation CSVs. It’s kinda messy, but might be the easiest and quickest way to achieve this.
Using the versionista ID is good enough to support the ad hoc analysis I want to do right now. Once we transition away from versionista, perhaps we should do a one-time update to the database to ingest all these legacy UUIDs and associated Annotations.
:+1:
As we move forward with having different differs as well, an annotation imported from sheets should have a field indicating that’s where it came from, (Versionista, Scanner, possibly others in the future.)
I don't think we'll need to worry about that when importing sheets of annotations. Each annotation (row in the sheets) already refers to a Version in our database, either by its Versionista ID or by web-monitoring-db UUID or both, and each Version already knows where it came from.
Note: the tooling for this was added in #233. Solving this is mainly a matter of executing that rake task regularly (or, more complex: setting up a job that does that work on a schedule).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.
This will be a requirement for migrating away from Google sheets for important changes.