funnel
funnel copied to clipboard
Additional membership record types are required
Immutable membership records are currently defined as a two-part model:
- Identification of the subject, access privileges and other associated data, with metadata on who created the record and when.
- Optional metadata on when the record was revoked, and by whom.
There is no indication on why the record was revoked, and the existence of a replacement must be guessed by the presence of another record with the exact same timestamp (transaction timestamp of revoked_at = created_at of next). This was chosen for data normalisation to ensure no duplication of data, but has not been ideal:
- The second half of the record is audit metadata and is insufficient for application requirements.
- Attempting to use it for application needs has resulted in two notification types per membership type, one for the first half (grant, accept, amend), another for the second half (revoke). This duplication has spilled over into code layout, templates and even UI (with the UI duplication only resolved in #1531).
- There is critical data missing on why revocation happened. It has to be inferred by looking at other records, and this is inefficient as it involves multiple queries in a data model that only the app is aware of, not the database.
In examining the various use cases, we've learnt that the close of a membership should be a new record, despite the duplicated metadata from the previous record's second half, and at least the following new record types are required:
expired— this was a time-limited record (a subscription or term limit) and it was identified as expired and automatically revoked. There is no actor associated with this record (granted_byhere andrevoked_byin the previous record are both null).resigned— the subject revoked their own record (could be a group of users for profile/account memberships, but the specific actor is identified viagranted_by).removed— someone else removed the subject. This is distinct fromresignedbecause of the ambiguity of whether the actor is the subject in a profile/account membership.archived— the subject's access rights have been removed, but their record remains in a current state, typically to continue to offer credit on a document. This applies to project crew for past events, and to submissions made by past employees/contractors, a use case in #1420.migrated(optional) — this used to be some other membership record in the past, but an entry in a new type of record was created due to changes in the application. This is functionally similar to the existingamend, so it may not be explicitly needed.
In addition, membership records should have a contiguity marker, either a foreign key reference to a previous record (creating a linked list), or a random UUID that is copied from record to record when invoking record.replace(). The UUID method is better for a bulk query; the linked list is convenient for finding the immediate previous record should it be needed, but is terribly inefficient for retrieving an entire chain.
One more required: apply, indicating an application for membership that must be approved by someone with the necessary roles (also an accept record). Like invite and the other proposed new record types, it will be special in not granting any roles.
This suggests there is an additional use possible for invite and apply: they can co-exist with a role-granting record, and if accepted, they replace two records at once (the existing role-granting record and the invitation or application). This allows for a workflow where an existing member has their roles changed with consent/approval.
However, since the unique constraint is enforced on the basis of revoked_at being null, that has to change to allow for this "draft of future state" record to co-exist.
Membership records have a distinct but related problem with schema migrations: since the records are supposed to be immutable, a migration that adds or removes a data column cannot do it without damaging the historical record. For example, with ProjectCrewNotification there is a proposal to drop the "usher" role entirely, and to replace "promoter" with the "profile_admin" role declared elsewhere, and in SponsorMembership to drop the "promoted" flag. We have three ways to achieve this change:
- Drop the columns, destroying the historical record.
- Keep the columns, but ignore their value. Future records will have empty columns.
- Move all these data columns into a JSON or HSTORE structure. Since the content is only relevant to the app and not the database, this allows for cleaner future records, letting the app process historical records as is relevant. A JSON store can continue to index keys that are of current interest.
The apply record type seems unnecessary as is indistinguishable from inviting self, ie:
self.record_type == MEMBERSHIP_RECORD_TYPE.INVITE and self.is_self_granted
However, this means the accept call requires a distinct role (accepter?) that is granted to whichever of subject and owner is the counterparty.
Moving data columns into a JSON store allows for the possibility of UGC that has no meaning to the server-side app but is meaningful to downstream consumers like client apps. This will require namespace separation, possibly with a mandatory prefix for internal or external uses. Options:
- Underscore prefix for app-specific data (eg:
_is_admin = True), as it usually flags something private. However, this is the only type currently allowed, and it will confuse the internal API as well. user_prefix for UGC, enforced through a form validator in future when this opens up. May be better.
Separately, the role access mechanism only has all-or-nothing access to sub-objects. This should be addressed in Coaster's RoleAccessProxy to enable such a use case, and will also solve for the open problem in RegistryMixin and StateManager.
Wanted: combine SQLAlchemy's JSONB store with dataclasses_json to provide attribute-based access and the other niceties that dataclasses have. We have a precedent with Coaster's JsonDict column type.
A dataclass interface isn't necessary. It can be a regular dict interface, and the values relevant to the app can be exposed as properties directly on the model. The properties can be responsible for providing a default when the key is missing (for eg, with an old record).
Should the data dict include a hash? Logically, it seems it should be hashed with the data values and the previous record's hash to establish a verified chain of continuity, but the app itself has no use for the hash since there is no data sync. It is fairly likely that a defect in the implementation will go undetected long enough for all the historical data to be unusable, so maybe it's best to leave this out until there is a clear use case.
However, we can prep for future uses by structuring the JSON store to not be a flat key:value system. It will need:
- A data format version number.
- Potentially hashed data in a distinct sub-dict as the hash itself will reside in the JSON dict
- Attribution for which actor changed each of the data columns and when, because now we can store all that – previously it required traversing the chain of historical records to find the pair in which one column changed.
Version identifiers also aren't very useful unless there is a well defined schema attached to each version, so we're back to defining a dataclass or a pydantic model for each version. I think we should also standardise on terms here:
Version: refers to the version of the data model Revision: refers to the revision of the data represented in the model. A new membership record is effectively a revision of the membership.
Places where we've used the term version to refer to revision should be renamed.
Further note: a schema is only required for reading older records. Since records are immutable, write operations only need to be concerned with the latest schema.
We can define a string version column and make a lookup table of schema translators from old version to current version. Problem: every time the current format changes, all of the schema translators have to be updated.
In addition, membership records should have a contiguity marker, either a foreign key reference to a previous record (creating a linked list), or a random UUID that is copied from record to record when invoking
record.replace(). The UUID method is better for a bulk query
If we use UUID1 (or UUID1MC) here, it'll also encode the timestamp and provide an immediate reference for the age of the membership, without needing to look up the chain of records. A regular timestamp column may also serve the same dual purpose, but won't have the random bits that a UUID has, so as a membership identifier it will need another column for disambiguation.
- Coaster has a
uuid1mcfunction that generates a UUID1 with a randomised MAC id, thereby not encoding the server's MAC id into the data. - The UUID timestamp can be used as a sort key in Python via
uuid1.time. Full example:sorted(uuidlist, key=lambda uuid1: uuid1.time). - Ordering can be done in PostgreSQL for greater efficiency using a custom function as described here: https://stackoverflow.com/a/61508178/78903. We've generally avoided PostgreSQL functions because they don't have an easy deployment mechanism. Every change requires a database migration. However, this should be an acceptable overhead for functions that are unlikely to ever need a change.