h
h copied to clipboard
SPIKE: bring the realtime annotation resource into alignment with the API
History & notes can be found here: https://github.com/hypothesis/h/issues/5486
Some notes on this:
We don't have access to the request
object proper deep down in the streamer
app, but I believe we have some options here. As noted in https://github.com/hypothesis/h/issues/5486, there are spots in the streamer
code that are already instantiating services directly (versus using the request.find_service
approach). The vast majority of services don't need the whole request
object. Most of them need the db session for access to the DB. We do have access to the session
in streamer
.
Unfortunately, the AnnotationJSONPresentation
service takes a boatload of init
params: https://github.com/hypothesis/h/blob/c4cd4025fa346605e84bbead7618e1613eb5d96d/h/services/annotation_json_presentation.py#L16 , each of which will need to be addressed.
I've walked through the services needed by AnnotationJSONPresentation
, and most of them can be instantiated within the streamer
code using the session
. The most challenging exception is the request.has_permission
method, which gets bandied around here. That one will take some more sleuthing.
My first-swag approach here would be along these prongs:
- See if there is any simplification that could be done to the
AnnotationJSONPresentation
service and the variousformatters
used to render annotations. This part of the app is deeply designed and may benefit from some simplification. The fact that theAnnotationJSONPresentation
service takes so many init args is a little smelly. - Figure out how to fill the gaps in request-related items and services needed by
AnnotationJSONPresentation
, the most immediately significant one beinghas_permission
. Another spot is thelinks
service, which requires some path/URL stuff off ofrequest
. An option would be not to returnlinks
with the realtime response, though really that's a symptom of the same problem we're trying to solve here overall... - Adapt
streamer
's annotation responses to useAnnotationJSONPresentation
in a similar manner toh.views.api.annotations
This will require some coordination with client code as the response would change—however, I believe these changes would be additive to the current realtime response, as at the end of the day, both the API and the streamer are using the same presenter—it's just that the API's version viaa AnnotationJSONPresentation
provides more formatters
to the presenter and, as such, more JSON data is glommed on.
It's possible to do this, but it's not possible to do it well and quickly for a few reasons:
- Our current "presenter" service is really setup to serialise multiple annotations for a single user
- It's also configured that that single user is the request user (which we can work around, but is awkward)
- It's not setup to efficiently send one annotation to multiple users (which is what we need to do)
- It has optimisations for pre-caching work for the one user, many annotations setup, but none for the other way around
- This means using it for that is slow, as it issues many requests to the DB
I have a path for fixing all of this, but it's not small, and I wanted to find out what the priority of all that is. Seeing as this is a spike, I think that's all that's required.
If you want to do it right
My suggestions for fixing this (as refactoring steps, not necessarily as PRs):
- Concentrate on sanity first:
- Move all of the
formatters
and thepresenter
and theservice
into a single service directory (they are all private to it) - Remove the interface stuff
- Stop sending formatters to the presenter to apply, and just apply them yourself
- Move the
user is None
check in the flagged service - Have a single
is_moderator
check instead of doing it two different ways - Stop using the moderation service to check if something is moderated (it's an attribute on the annotation!)
- Move all of the
- Refactor the interface to accept an annotation to present, and a user to present it to (instead of being provided a fixed user at construction time)
- Inline the formatters:
- Separate the caching behavior and the formatting behavior
- Inline all of the formatting behavior into the service (it was about 30 lines when I did it)
- Make it not cache for now
- Partial example here: https://github.com/hypothesis/h/pull/6317
- Change the cardinality and introduce caching objects
- Create more flexible caching objects for answering whether annotations are flagged by a user etc.
- Make it so these can pre-cache what they need to to answer the questions
- Make it so these caches work for either many users or many annotations (or both)
- Push those elements down into the relevant services so anything can use them (if possible)
- Change the cardinality of the presentation functions to accepts lists of users or annotations
The final call in websocket could then look something like this:
# Precache what you have to here:
presenter = presentation_service.get_presenter_for(annotations=[annotation], users=...)
# Iterate quickly here
for user in users:
serial = presenter.present(annotation, user)
You could add convenience methods for other common cases.
Other observations
- The flag count service is private to this too
- The flag service is very close, there's only one
create()
call outside it - Worth checking the other services
If you want to do it quick
- Add a way of fooling our services to think that the user for the socket we are looking at is the current user
- Recreate the service for every socket
- Serialise from scratch each time
Example PR: https://github.com/hypothesis/h/pull/6316
This works, but it's slower than it was before we started the optimisation work (i.e. this + optimisation = slower than before). I don't know if this will create a problem, as we rely on the fact that relatively few sockets are interested in any given message. So you might get away with it. You might also get really badly killed if we got sufficient users on a single page (a large active course or popular website). This is relatively unlikely I'd say, as any rate of annotations which might cause us trouble would probably too much for people to look at, but who knows?
If we are expecting 10x or 100x the people, this is going to be a concern.