lms Show only annotations from current assignment by default

Currently Hypothesis fetches annotations based on the document URL/ID and group, and does not take into account the LMS assignment ID. This means that if the same content is re-used across multiple assignments, the same set of annotations will show up on each. This behavior was inherited from the way Hypothesis works outside of the LMS. We have been able to get away with it because most assignments in a course will use different content and in that case each assignment will show a unique set of annotations. However it is problematic for cases where content is re-used, because it is unclear to users which context a particular annotation was made in. Some examples of when this happens:

A long document (such as a book or PDF) is re-used across multiple assignments, with each assignment focusing on a segment of the work.
Content from a third-party provider is being used, and the instructor is not able to break up the content into units that match what they want to teach
An instructor is intentionally re-using the same content, but setting a different task for each assignment

This issue proposes to change the behavior so that by default, Hypothesis will only show annotations from the current assignment. This change would have a number of benefits:

It is aligns with the conceptual model that many of our users probably already have about how Hypothesis works
It aligns with how we talk about user activity in various contexts (eg. we send out notifications to instructors saying "your students made 5 annotations on assignment X")
In the context of assignments that are graded, it means that only the relevant student work (ie. annotations made in the context of a particular assignment) is shown when the teacher is grading
When instructors are teaching long documents or books, it avoids the distraction of showing annotations from other assignments using the same content
It makes it easier to support use cases where the same content is intentionally re-used but with different tasks
Storing the data needed to support this could also enable capabilities such as grouping annotations by assignment in a future notebook iteration

As discussed in this Slack thread there are some use cases when it would be useful to surface annotations from the same content in other assignments. For example:

A student wants to export all the notes they made on some content. It would be annoying if they had to collate notes from across multiple assignments
An instructor creates two assignments using the same the content, and the student or teacher want to consult their notes from the other assignment

Migration

There are some constraints to consider in making this change:

We only recently (November 2023) started recording the LMS assignment ID as metadata with annotations. For annotations created earlier we cannot uniquely identify which assignment they are associated with.
Although this change would not be noticeable in most assignments, because they don't use the same content as another assignment in a course, it is a significant conceptual change to how Hypothesis works and we want to avoid causing disruption while an assignment is in the middle of being completed

For these reasons, my thinking is that this new behavior would be limited to assignments created after some cut-off date which is later than when we started recording assignment ID associations with annotations.

If we provided a way to show annotations from all assignments on a document, then conceptually the fallback behavior is equivalent to forcibly turning that option on.

Related discussions

Slack thread in #prod-general with the initial proposal: https://hypothes-is.slack.com/archives/C4K6M7P5E/p1699266629799979 Slack thread in #support soliciting feedback on making this change: https://hypothes-is.slack.com/archives/C2BLQDKHA/p1699268047455859

Implementation notes

We currently record assignment ID with annotations by having the LMS app pass an opaque blob of JSON to the client which is then posted to h and stored as a schemaless JSON blob in h's DB. This is convenient and flexible from the LMS app's perspective, but doesn't robustly guard against mistakes in transiting the data and doesn't allow efficient filtering on the h side. If we are going to start using this to affect what the user sees when they load an assignment, we are going to need to store the data in h in a way that is more robust and allows more efficient querying.

One general idea I had was based on the observation that there is a correspondence between data model concepts in h and lms, with the h equivalents being more general. eg. The lms app has students and teachers. h has users. The lms app has courses and reading groups, h has groups. We could take a similar approach with assignments. The LMS app knows about assignments, h would know about activities or contexts. An assignment would have an associated activity/context ID (some kind of URI) which could be filtered on. Alternatively we could just introduce an explicit "assignment" concept in h.

Nov 07 '23 11:11 robertknight

We currently record assignment ID with annotations by having the LMS app pass an opaque blob of JSON to the client which is then posted to h and stored as a schemaless JSON blob in h's DB. This is convenient and flexible from the LMS app's perspective, but doesn't robustly guard against mistakes in transiting the data and doesn't allow efficient filtering on the h side. If we are going to start using this to affect what the user sees when they load an assignment, we are going to need to store the data in h in a way that is more robust and allows more efficient querying.

While I agree with the general thoughts here in theory I don't think in practice this should necessarily translate to a change of how assignments are stored in the H's DB right now.

Annotations (IMO) must be queried by group regardless of this change. Group is the main concept in H's security model and that should not change.

This means that in practice the query to filter out annotations from other assignments will work with the current subset of annotations (the ones that belong to the group) and filtering those. This is true if implemented in SQL, python or JS.

I reckon the main task of this issue is to figure out the UI/UX around the new annotation/assignment concept, things like:

Is this configured per assignment? or the default for new ones
Is there going to be UI to access the rest of the annotations (from other assignments of the same document)?
How we handle old assignments vs new ones?

I reckon data migrations, new concepts in H's code base... will distract us from those.

My ideal solution to the technical issue of filtering out the annotations would be to either:

Doing it client side. The LMS frontend already informs client of some metadata to insert new annotations with. This could be extended to filter which annotations to display based on that metadata. The client will fetch the same amount of data that's fetching now.
Doing it server side but based on the metadata concept. Mirror the POST API with metadata to the GET endpoint, something along the lines of ?medatadata.assignment_id=XXXX

We don't need to find a solution to generalizes to all future metadata needs. A simple solution like this one will go a long way.

Dec 11 '23 15:12 marcospri

While I agree with the general thoughts here in theory I don't think in practice this should necessarily translate to a change of how assignments are stored in the H's DB right now. Annotations (IMO) must be queried by group regardless of this change. Group is the main concept in H's security model and that should not change.

To be clear, I'm not proposing we change the fact that annotations are filtered based on group. That's obviously still necessary because there can be multiple students in different groups doing the same assignment.

I think the filtering by annotation ought to be done server side, for several reasons:

I anticipate there will be a need to query/filter annotations by assignment in other contexts (eg. a dashboard, API access for college admins)
The assignment ID is being elevated to a similar level of importance as the document ID. As such I think it makes sense to somehow make it a first-class citizen in the API, internally in h and ultimately in the h DB structure. We're probably not going to get there in one step, but I think it makes sense to take steps in that direction.

I reckon the main task of this issue is to figure out the UI/UX around the new annotation/assignment concept, things like: Is this configured per assignment? or the default for new ones Is there going to be UI to access the rest of the annotations (from other assignments of the same document)? How we handle old assignments vs new ones?

Sure, these things are important and it makes sense to discuss them up front in case things come up which we need to talk to folks outside the dev team about, or they affect the implementation.

I reckon data migrations, new concepts in H's code base... will distract us from those.

The problem if we don't eventually reify these domain concepts in the schema(s) of h is that it will become harder for anyone who interacts with the system to grok how it actually works.

To give an example, we could in principle stuff the assignment ID in a special tag which the client hides from the user. It would be superficially very convenient - no need to change the DB, ES etc. However it also means we wouldn't be enforcing the structure of those tags. It wouldn't be obvious to anyone looking at API responses or around the internals of h code that this is a thing that exists that is very important to how the system operates and so on.

Dec 11 '23 16:12 robertknight