typedb Redesign schema modification capabilities

Redesign schema modification capabilities

Open krishnangovindraj opened this issue 1 year ago • 1 comments

Usage and product changes

We redesign schema modification to allow much more flexible in-place changes to the database schema. We relax various schema invariants within a schema write transaction, to allow moving and editing schema types on the fly. However, the data is validated against the schema consistency at each step, allowing full and safe use of TypeDB's existing Concept and Query API. Before committing, we can restore schema invariants guided by TypeDB's exceptions API (ConceptManager.getSchemaExceptions()).

Expected schema migration workflow

This change facilitates large-scale database schema migration. We expect the following workflow to be adopted:

Open a schema session, and a write transaction. This blocks writes anywhere on the system.
Mutate the schema incrementally. Mutations that expand schema are always possible and cheap, mutations that restrict the schema are validated against the existing data for conformance to the new schema. All schema states you move through must match the current state of the data. a. If your data does not fit the new schema state, in 2.x you will get an exception on commit and it will roll back. You must open a data session+transaction to mutate the data into the shape it is expected to be and commit this. Then go back into schema session+transaction and retry the schema mutation. b. In TypeDB 3.0 these operations will be possible all within one schema write transaction, smoothing out the schema migration workflow.
To make schema migration simpler, some schema invariants are relaxed within a schema write transaction: a. Dangling overrides are allowed: overridden types (... as TYPE) are allowed to refer to types that are not overridable at that place in the schema. This is common when moving a type from one supertype to a different supertype. b. Redeclarations are allowed: Declarations of owns, plays, or annotations, may be duplicated in child types. This facilitates moving types from one supertype to a different supertype, or moving declarations up or down the type hierarchy. c. Relaxed abstract ownership: Types may own abstract attribute types without themselves being abstract.
All of these invariants must be restored before commit, or the transaction will fail and the changes will be rolled back. To retrieve the set of errors that must be fixed before commit, use the api ConceptManager.getSchemaExceptions() (transaction.concepts().getSchemaExceptions() in most drivers).

Operations that expand the schema capabilities:

Adding a new type
Adding a plays or owns
Removing an override
Removing an annotation
Removing abstractness

Operations that restrict the schema capabilities:

Removing a type
Removing a plays or owns
Adding an override
Adding an annotation
Adding abstractness

Implementation

Implements a new SubtypeValidation class with static methods for validating the integrity of the subtrees of the types modified by a schema operation
- Validation confirming whether the schema modification leaves type itself in a valid state wrt its ancestors is done as before in the modification functions .
- The ancestors do not require validation as their integrity is unaffected
Move some existing validation of declarations into DeclarationValidation

Notes

Only a valid schema can be committed. The validation considers the following properties, which are checked either at 1) immediately, at operation time or 2) deferred (requested via transaction.concepts.getSchemaExceptions or on commit):

Immediate validation:

A relation/entity/attribute may only subtype another relation/entity/attribute.
- Further, attribute types must have the same value type as their supertype (unless their supertype is the root attribute type)
A type which is referenced in a rule may not be deleted.
The data is always valid with respect to the schema.
Only abstract attribute type can have subtypes.

Deferred validation:

Tese have been delayed to commit time to allow certain schema modifications to be performed with data in place.

A Plays/Owns declarations may only override a types which is a) a supertype of the role-type/attribute-type being declared as being owned/played b) a type which would otherwise be played/owned via inheritance.
There are no redundant re-declarations of owns/plays in the schema:
- A redeclaration of an inherited plays is always redundant
- A redeclaration of an inherited ownership is redundant if it does not make the annotations more strict.
A non-abstract entity/relation/attribute type may not 'own' an abstract attribute type.

Data integrity invariants

There are never instances of types, ownerships or relations which are not allowed in the schema. Violations are:

A vertex exists for a type that was deleted
An instance of a relation type 'relates' an instance of a roletype it does not declare, or inherit, or is hidden by an overriding 'plays' declaration
An instance of an entity/relation/attribute type 'owns' an attribute of a type it does not declare, or inherit, or is hidden by an overriding 'owns' declaration
An instance of an entity/relation/attribute type 'plays' a role of a type it does not declare, or inherit, or is hidden by an overriding 'plays' declaration
An ownership violates its @key or @unique constraint.

This is validated when:

Undefining types, roles
Undefining owns & plays declarations
Changing the supertype of a given type
Introducing an override which causes a role or attribute to be "hidden".
Setting a type to be abstract.
Increasing the strictness of annotations on an ownership.

Feb 06 '24 17:02 krishnangovindraj

PR Review Checklist

Do not edit the content of this comment. The PR reviewer should simply update this comment by ticking each review item below, as they get completed.

Trivial Change

[ ] This change is trivial and does not require a code or architecture review.

Code

[x] Packages, classes, and methods have a single domain of responsibility.
[x] Packages, classes, and methods are grouped into cohesive and consistent domain model.
[x] The code is canonical and the minimum required to achieve the goal.
[x] Modules, libraries, and APIs are easy to use, robust (foolproof and not errorprone), and tested.
[x] Logic and naming has clear narrative that communicates the accurate intent and responsibility of each module (e.g. method, class, etc.).
[x] The code is algorithmically efficient and scalable for the whole application.

Architecture

[x] Any required refactoring is completed, and the architecture does not introduce technical debt incidentally.
[x] Any required build and release automations are updated and/or implemented.
[x] Any new components follows a consistent style with respect to the pre-existing codebase.
[x] The architecture intuitively reflects the application domain, and is easy to understand.
[x] The architecture has a well-defined hierarchy of encapsulated components.
[x] The architecture is extensible and scalable.

Feb 06 '24 17:02 typedb-bot

typedb typedb copied to clipboard

Redesign schema modification capabilities

Usage and product changes

Expected schema migration workflow

Implementation

Notes

Immediate validation:

Deferred validation:

Data integrity invariants

PR Review Checklist

typedb
typedb copied to clipboard