h
h copied to clipboard
Add `authuser` to blacklisted query params
A user was having trouble with Hypothesis preserving URL parameters on a Google Site she’s using:
- https://sites.google.com/gcds.net/bperiodshortstories/the-tell-tale-heart?authuser=0 (document_id = 946493)
- https://sites.google.com/gcds.net/bperiodshortstories/the-tell-tale-heart (document_id = 946414)
From this Slack conversation, Jon Udell recommended we add authuser to our blacklisted query params here.
Additional context https://hypothesis.zendesk.com/agent/tickets/10012
I'm reverting this for the moment: https://github.com/hypothesis/h/pull/6106
I think this commit is dangerous in it's present state as users on pages without a "canonical" link, but with a "authuser" will lose access to their annotations. We'll have to think about this a bit more carefully before deploying.
One option is to have a database migration that "re-canonicalises annotations"
This is probably not straight forward at all
There are a couple of different interacting pieces at play here:
- We have the URL normalisation logic at the bottom (the change suggested here)
- This is going to interact with our (fairly complicated) document equivalence system on top
- This could / possibly should result in documents getting merged
- This has an effect on both Postgres rows and Elasticsearch for this to work
Essentially our existing system for deciding whether two documents are the same has an implicit assumption that how we decide two things are the same isn't going to change over time. Long story short, there is no obvious way to tweak a column here, or a function there and have this work correctly.
This doesn't look immediately like a database migration, it looks like a missing capability: to be able to re-index a set of documents under new rules. We'd have to add a new capability to the system, add a way of calling it, and then chose to call it on relevant documents to get them re-indexed correctly.
All of this is with very little digging in the code at the moment, so all conclusions are pretty speculative.
So what now?
A couple of different paths suggest themselves:
- Do nothing:
- Existing annotations remain, but different users continue to appear to be on different documents.
- Essentially don't fix it
- Deploy this fix, no other action:
- All existing annotations on these documents vanish from the users point of view...
- ...but new ones will appear together (yay?)
- Find some way of merging specific docs:
- We could merge the docs for the specific users who raised the issue.
- You can link any two documents in h (irreversibly) and this would 'fix' the issue for these specific users, on their specific page.
- Doesn't help for any other pages, including for these users.
- Support labour intensive, and would also require some work to stream line this.
- o probably a non-starter
- Investigate a new capability:
- Invest time in working out how to get the best of all worlds.
- This could be expensive.
- We don't know how long this would take. Probably not quick
I think at this point we need to stop and re-assess the effort/reward ratio on this. When this was looking like a low risk quick fix, it was a no brainer to help our our users. With increasing cost / complexity we potentially delay features for other customers and introduce the risk of making mistakes with users data.
It's possible the overall user impact of fixing this may be negative at this time. This isn't a technical question though.
@jon-betts thanks so much for summing up our options. While we aren't necessarily planning to bring this into a sprint, @dwhly does have a few questions he'd like to ask to help us assess whether/how to move forward. I will schedule something after you return from holiday.
@judell we'd like to include you on that call as well. This will be at least 1 week out from today. In the meantime, would you be able/willing to query our Db how many annotations have been made on URLs that contain "authuser" (and, if possible, how many of those are authuser=0 vs authuser=1, etc)?
Annotations: 6070 authuser=0: 4239 authuser=1: 1560 authuser=2: 249 authuser=3: 11 authuser=4: 11 authuser=5: 0
Documents: 780
User: 702
I'll echo what Other Jon said: we perturb the doc equiv system at our peril.
It's true that I did say we might want to consider adding authuser to the short list of excluded uri params. But if we do so, I wouldn't expect anything historical to get fixed, and would be very leery of trying to renormalize/reindex. Excluding that param would just be a way to ensure equivalence going forward.
@judell thanks for this - and I have pity for anyone who is managing 5 Google accounts!
It will be super helpful to have this data handy when we talk after Jon's return.
@klemay I think this has been deprioritised for now, so I'm going to sling it out of the sprint.
The last I think I know about this is that Jon collected some data about this in: https://docs.google.com/document/d/1TRLOzc2BARFshIYNNIozWX-ni8IaoYr8MU5S4R-91ek
There are on-going talks about document merging in general and how it might be quite broken:
https://hypothes-is.slack.com/archives/C4K6M7P5E/p1596623326487000
So as far as I'm aware, there isn't / we can't make a decision about this right now.
@jon-betts makes sense - we need to do the spikes on document equivalence in general and the document_id issue before taking action here.
https://hypothesis.zendesk.com/agent/tickets/13379