hydra
hydra copied to clipboard
Algebraic types II
Background
As already explained in the background section here
https://github.com/Joystream/joystream/issues/554
We have a problem domain which has algebraic types all over the place, and it would be a big benefit if they could be reflected neatly in our query infrastructure. As explained in that first post, doing it deeply (approach 4), is ideal, however a later post points out
https://github.com/Joystream/joystream/issues/554#issuecomment-640254649
The resulting GraphQL API must generate OpenCRUD like capabilities that tie into matching these algebraic types, and this is a non-trivial task.
This issue attempts to clarify how this can be done.
Proposal
Input Schema
Algebraic Types
The new algebraic types respect the GraphQL standard, specifically they are informally defined as
-
union <name> = T_1 | ... | T_N
where name is a GraphQLNamedType
, andT_i
are algebraic types, and referred to as cases of the type. This algebraic type is called an algebraic union. -
type <name> @variant { f_1: T_1 ... f_N: T_N }
where name is a GraphQLNamedType
,f_i
is a GraphQLName
,T_i
is a non-ID GraphQLType
or an algebraic union. This algebraic type is called an algebraic variant, and such a type without any algebraic union members is called called a flat variant.
An algebraic union can be a member field of one or more normal @entity
types, and can use non-null requirement. An algebraic variant cannot be a member of an @entity
type.
Here is an example
type Miserable @variant {
hates: String!
}
type HappyPoor @variant {
isMale: boolean
}
union Poor = HappyPoor | Miserable
type MiddleClass @variant {
father: Poor
mother: Poor
}
type Rich @variant {
bank: EntityC
}
union Status = Poor | MiddleClass | Rich
type EntityA @entity {
id: ID!
status*: Status!
}
type EntityB @entity {
id: ID!
status_b: Status!
status_b: Status
}
type EntityC @entity {
id: ID!
}
Relationships
When an algebraic union is a member in an entity type, there are cross-entity constraints around how entity member fields, or list thereof, because they model relationship semantics. This is also the case for normal entities.
In order to validate these requirements, just proceed as if every entity has any field in occurring in a member algebraic type as a top level field. If the resulting set of entities have valid relationship references, then the original usage of entity member fields in algebraic types is valid.
Concepts
An algebraic type can be represented as a labelled tree as follows
-
tree(union type <name> = T_1 | ... | T_N) = T[UNION||name, (ε, tree(T_1)), ..., (ε, tree(T_N))]
-
tree(type <name> @variant { f_1: T_1 ... f_N: T_N }) =
-
T[VARIANT||name, (f_{g_1}, tree(T_{g_1})), ... , (f_{g_M}, tree(T_{g_M}))]
whereg_i
are distinct indexes of one or more algebraic member fields. -
Node[VARIANT||name]
when all fieldsf_i
are non-algebraic.
-
where
- for labelled non-empty trees
c_1,...,c_n
, and string labelsl_1,...,l_n
,T[x, (l_1, c_1), ..., (l_N, c_N)]
is the tree with the root labelled withx
and the roots ofc_1,...,c_n
as children, each with an edge labelledl_1,...,l_n
. -
Node[name]
is a labelled node without any children. -
||
is a string concatenation operator. -
ε
is the empty string.
Any node in such a tree that corresponds to a algebraic union type, is called a union node, an any node corresponding to a variant type is called a variant node.
Given such a tree we can define the idea of coherent union, which is a subset of union nodes in the tree such that if you mark all edges from a node in the set all the way to the root, then this marked tree should have no union node which has more than one marked edge with a child. From this it should be clear that for any two nodes in such a set, the way that their paths to the root avoid violating this constraint is that the join up in some variant node because it has at least two union members. Here is the set of all coherent unions for Status
above
-
{Status}
-
{Status/Poor}
-
{Status/MiddleClass.father}
-
{Status/MiddleClass.mother}
-
{Status/MiddleClass.father, Status/MiddleClass.mother}
-
{Status/Rich}
where each node is represented by its path from the root. Computing the set of of all such matches is trivially done recursively by coherent_unions(ε, tree(Status))
-
coherent_unions(s, T[UNION||name, (ε, T_1), ..., (ε, T_N)]) = {s||name} U coherent_unions(s||name/,T_1) U ... U coherent_unions(s||name/,T_N)
-
coherent_unions(s, T[VARIANT||name, (f_1, T_1), ... , (f_N, T_N)]) = coherent_unions(s||name||.||f_1,T_1) U ... U coherent_unions(s||name||.||f_N,T_N) U all_combined_coherent_unions(s, name, (f_1, T_1), ... , (f_N, T_N))
-
coherent_unions(Node[VARIANT||name]) = Ø
where all_combined_coherent_unions(s, name, (f_1, T_1), ... , (f_N, T_N))
will for be the union of
- for each non-empty subset of inputs
T_{g_1},...,T_{g_M}
whereM>1
do the next step - compute
C_{g_i} = coherent_unions(s||name||.||f_{g_i},T_{g_i})
- return flattened version of
C_{g_1} X ... X C_{g_M}
, i.e. where tuples are turned into sets.
GraphQL API
The key goal of the generated API for entities that have one or more algebraic member types is to allow for safe, expressive and practical queries that are sensitive to the algebraic structure. The key observation in the resulting API is that such queries each correspond to the concept of a coherent union, defined prior, as follows.
- Pick an entity type which has algebraic union member fields
f_1: T_1,...,f_N:T_N
. Notice that we allow for the entity type to have multiple union member fields. - Compute the set of coherent unions
C_i
for each union typeT_i
. - For each subset of fields
f_{g_1}, ..., f_{g_M}
do the next steps. - For each
(c_1,...,c_M)
inC_{g_1}x...xC_{g_M}
do the next steps. Recall thatc_i
is a set of union nodes. - For each
(n_1,...,n_M)
inc_1,...,c_M
do the next steps. - For each
(v_1,...,v_M)
wherev_i
is a child node ofn_i
, generate the following query:
- accepts where and order-by OpenCrud input types for each
v_i
if type is suitable (i.e. has comparable fields), and also pagination inputs. - returns a type which is the result of taking the initial entity type, taking each
v_i
and replacingn_i
in the type tree to which it corresponds.
Database
Here the idea is very simple, simply take the table for any entity type and fully flatten, in the natural way, the type tree of any algebraic type. Union case indicators should be encoded as database level enumerated types. There should also
The TypeORM embedded entities approach may be a natural way to do this at the query node level, as it allows for the same class for an algebraic type to be reused across multiple fields in an entity, or multiple entities, as that is possible.
The most important thing is to capture as many constraints as possible at both the ORM and database level. The former will give the mapping author an ergonomic and safe interface to work with. It should indeed be possible to offer them a fully statically type safe interface for working with a given generated schema. The latter will protect against inadvertent representational corruption, for example when writing migration or initialisation code in the future.
Supporting variant types requires different design trade-offs in order to address both Postgres and GraphQL limitations.
GraphQL input types
The main limitation at the GraphQL side is the lack of union input types: https://github.com/graphql/graphql-js/pull/1196 At the database (e.g. Postgres) level the fundamental limitation is the "flat" structure of the tables, that is, the set of columns = entity properties is fixed by the table schema.
There are no mutations, and the API should offer a set of self-describing queries. A typical query returns a list of entities (more precisely, a specified set of fields) filtered by the where
clause.
A query per variant type
The approach described above suggests reflecting the type system by defining a lot of separate queries for each combination of the fields the fully flattened entity may have. In the example above, the flattened version ofEntityA.status
have fields hates, isMale, bank, father, mother
, coming from the different realizations of the status
field. The main advantage of having a separate query for each possible realization is the "code completion" feature provided for each input type, tailored to each variant of the union type. The drawback is that the number of queries explodes if there are entities with multiple fields of algebraic type and/or if the input types form a complex tree as in the example above.
A bag of input fields approach
An alternative approach is to combine all the possible input fields into single 'JUMBO-INPUT' type:
EnitityAWhereInput {
hates_eq: String,
isMale_eq: boolean,
bank: String,
father_hates_eq: String,
father_isMale_eq: String,
mother_hates_eq: String,
mother_isMale_eq: String,
...
}
Many combinations of the input fields are impossible, but that would simply return an empty list.
The benefit of this approach is a conceptual and implementation simplicity (both at the Warthog-middleware and TypeORM level). The drawback is that the API consumer has to have an out-of-band
knowledge of which input fields are actually compatible.
JSON type in InputTypes
A more flexible approach is to support free JSON in the where input. More concretely, in the example above, EntityAWhereInput
will have, among others, a JSON field status_json
. Such an input field accepts arbitrary JSON object, with the scalar attribute filtered according to OpenCRUD spec. For example, the query for selecting EntityA
where the status is MiddleClass
and the father is HappyPoor
is going to be
query {
accounts(
where: {
status_json: {
father: { isMale_eq: true }
}
}
) {
id
status {
__typename
... on MiddleClass {
father {
... on HappyPoor {
isMale
}
}
mother {
... on Miserable {
hates
}
}
}
}
}
}
We adopt the latter option (JSON input type)
Implementation details
Under the hood, the union types are stored as jsonb
type columns, [natively supported by PostgresSQL] (https://www.postgresql.org/docs/current/functions-json.html). This enables a rather straightforward translation from the JSON field of the WhereInput into the json-specific SQL syntax of Postgres (this part is implemented in warthog).
The TypeScript types and classes are created using the GraphQL union type integration provided by the type-grapqhl
library. This gives type-safety for the entity classes, which can later be exported for independent external use (as done by the indexer mappers).
Current limitations
The following constraints must hold, otherwise a validation error will be thrown.
- A
@variant
must be an ObjectType (scalars, unions, and interfaces are not allowed). - A variant type field must be either of
- scalar type
- union type No entity references are allowed.
- Entity types cannot have fields of variant type (but union types are allowed).
- A union type must be composed exclusively out of
@variant
types.