securedrop icon indicating copy to clipboard operation
securedrop copied to clipboard

Record source state changes and expose them via the API

Open eloquence opened this issue 9 months ago • 1 comments

In order to facilitate more effective synchronization such as the strategies discussed https://github.com/freedomofpress/securedrop-client/issues/1549, it is very likely that we will want to enable clients to fetch only sources whose state has been modified. State changes may include:

  • New source messages/files (covered with the existing last_updated timestamp)
  • New journalist replies
  • Star status changes
  • Changes to seen/unseen state of any associated messages or files
  • Deletion of files or messages without deletion of the source itself
  • "Read receipts" from sources (triggered when a source "deletes" journalist replies)

Any such state change should cause a client to update its representation of the source.

This could be implemented as, e.g.:

  • a timestamp
  • a version integer

A timestamp may be appealing because it would give the client the ability to specify a cutoff point in the sync query itself, to only receive a list of sources with updates greater than some time. However, this would require additional support to track deletions of entire sources.

(This is an alternative to an API endpoint that would return a changes-feed. It is orthogonal to #4863, which should be considered independently if we want to show timestamps for the most recent journalist activity.)

Acceptance criteria

Given that I am maintaining a client that interacts with the server via the API When I query the list of sources Then I should be able to determine from the response whether I need to fetch updates for any given source

eloquence avatar Apr 10 '25 21:04 eloquence

If (as under consideration elsewhere) the Client were to consume and persist locally the API's JSON representations, another strategy for incremental updates would be:

  1. The bulk endpoint (like freedomofpress/securedrop-client#1549) for a given resource returns a list of (key, hash(value)) pairs (realistically, a {key: hash(value)} dictionary), where value is the JSON object for key.
  2. Initially, the Client requests all keys, retrieves their values, and saves their hash(value)s.
  3. Whenever a resource changes on the Server, value changes, so hash(value) changes, so the Client knows to update that key.

Pros:

  • No need to enforce a monotonic version counter on the Server: hash(value) is just hash(instance.to_json()) for a given SQLAlchemy model instance.
  • No need to check logic around a monotonic version counter on the Client. It's already harder than it should be to manage the source-level submission counter used to index conversation items; let's not add another.

Cons:

  • Larger on the wire: If key is a UUID, (key, hash(value)) will be roughly twice the size of (key, version) for an integer version. (But even this would only bump freedomofpress/securedrop-client#1549's example from 15 to 30 KiB, still much smaller than the current 1 MiB.)
  • More expensive to compute. But writes are rare, and we control all of them, so we can just cache hash(value) at the same time as we write value (or the data from which it's derived).

cfm avatar Apr 30 '25 23:04 cfm