typedb
typedb copied to clipboard
Redesign schema modification capabilities
Usage and product changes
We redesign schema modification to allow much more flexible in-place changes to the database schema. We relax various schema invariants within a schema write transaction, to allow moving and editing schema types on the fly. However, the data is validated against the schema consistency at each step, allowing full and safe use of TypeDB's existing Concept and Query API. Before committing, we can restore schema invariants guided by TypeDB's exceptions API (ConceptManager.getSchemaExceptions()
).
Expected schema migration workflow
This change facilitates large-scale database schema migration. We expect the following workflow to be adopted:
- Open a schema session, and a write transaction. This blocks writes anywhere on the system.
- Mutate the schema incrementally. Mutations that expand schema are always possible and cheap, mutations that restrict the schema are validated against the existing data for conformance to the new schema. All schema states you move through must match the current state of the data. a. If your data does not fit the new schema state, in 2.x you will get an exception on commit and it will roll back. You must open a data session+transaction to mutate the data into the shape it is expected to be and commit this. Then go back into schema session+transaction and retry the schema mutation. b. In TypeDB 3.0 these operations will be possible all within one schema write transaction, smoothing out the schema migration workflow.
- To make schema migration simpler, some schema invariants are relaxed within a schema write transaction:
a. Dangling overrides are allowed: overridden types (
... as TYPE
) are allowed to refer to types that are not overridable at that place in the schema. This is common when moving a type from one supertype to a different supertype. b. Redeclarations are allowed: Declarations ofowns
,plays
, or annotations, may be duplicated in child types. This facilitates moving types from one supertype to a different supertype, or moving declarations up or down the type hierarchy. c. Relaxed abstract ownership: Types may own abstract attribute types without themselves being abstract. - All of these invariants must be restored before commit, or the transaction will fail and the changes will be rolled back. To retrieve the set of errors that must be fixed before commit, use the api
ConceptManager.getSchemaExceptions()
(transaction.concepts().getSchemaExceptions()
in most drivers).
Operations that expand the schema capabilities:
- Adding a new type
- Adding a
plays
orowns
- Removing an override
- Removing an annotation
- Removing abstractness
Operations that restrict the schema capabilities:
- Removing a type
- Removing a
plays
orowns
- Adding an override
- Adding an annotation
- Adding abstractness
Implementation
- Implements a new
SubtypeValidation
class with static methods for validating the integrity of the subtrees of the types modified by a schema operation- Validation confirming whether the schema modification leaves type itself in a valid state wrt its ancestors is done as before in the modification functions .
- The ancestors do not require validation as their integrity is unaffected
- Move some existing validation of declarations into
DeclarationValidation
Notes
Only a valid schema can be committed. The validation considers the following properties, which are checked either at 1) immediately, at operation time or 2) deferred (requested via transaction.concepts.getSchemaExceptions
or on commit):
Immediate validation:
- A relation/entity/attribute may only subtype another relation/entity/attribute.
- Further, attribute types must have the same value type as their supertype (unless their supertype is the root attribute type)
- A type which is referenced in a rule may not be deleted.
- The data is always valid with respect to the schema.
- Only abstract attribute type can have subtypes.
Deferred validation:
Tese have been delayed to commit time to allow certain schema modifications to be performed with data in place.
- A Plays/Owns declarations may only override a types which is a) a supertype of the role-type/attribute-type being declared as being owned/played b) a type which would otherwise be played/owned via inheritance.
- There are no redundant re-declarations of owns/plays in the schema:
- A redeclaration of an inherited plays is always redundant
- A redeclaration of an inherited ownership is redundant if it does not make the annotations more strict.
- A non-abstract entity/relation/attribute type may not 'own' an abstract attribute type.
Data integrity invariants
There are never instances of types, ownerships or relations which are not allowed in the schema. Violations are:
- A vertex exists for a type that was deleted
- An instance of a relation type 'relates' an instance of a roletype it does not declare, or inherit, or is hidden by an overriding 'plays' declaration
- An instance of an entity/relation/attribute type 'owns' an attribute of a type it does not declare, or inherit, or is hidden by an overriding 'owns' declaration
- An instance of an entity/relation/attribute type 'plays' a role of a type it does not declare, or inherit, or is hidden by an overriding 'plays' declaration
- An ownership violates its
@key
or@unique
constraint.
This is validated when:
- Undefining types, roles
- Undefining owns & plays declarations
- Changing the supertype of a given type
- Introducing an override which causes a role or attribute to be "hidden".
- Setting a type to be abstract.
- Increasing the strictness of annotations on an ownership.
PR Review Checklist
Do not edit the content of this comment. The PR reviewer should simply update this comment by ticking each review item below, as they get completed.
Trivial Change
- [ ] This change is trivial and does not require a code or architecture review.
Code
- [x] Packages, classes, and methods have a single domain of responsibility.
- [x] Packages, classes, and methods are grouped into cohesive and consistent domain model.
- [x] The code is canonical and the minimum required to achieve the goal.
- [x] Modules, libraries, and APIs are easy to use, robust (foolproof and not errorprone), and tested.
- [x] Logic and naming has clear narrative that communicates the accurate intent and responsibility of each module (e.g. method, class, etc.).
- [x] The code is algorithmically efficient and scalable for the whole application.
Architecture
- [x] Any required refactoring is completed, and the architecture does not introduce technical debt incidentally.
- [x] Any required build and release automations are updated and/or implemented.
- [x] Any new components follows a consistent style with respect to the pre-existing codebase.
- [x] The architecture intuitively reflects the application domain, and is easy to understand.
- [x] The architecture has a well-defined hierarchy of encapsulated components.
- [x] The architecture is extensible and scalable.