Ampersand
Ampersand copied to clipboard
Semantics of INCLUDE (this has to do with namespaces)
Problem
Namespaces require semantics that will prepare us to work with distributed systems and allow us to do data migrations. So far, we have generated information systems with one unified namespace. The semantics of the INCLUDE
statement until Ampersand vs. 5.0 is the set union. To support data migration, we need to support three systems, one of which has an INCLUDE relation with the two others.
Requirements
Proposed solution
In issue #850 we decided to borrow Haskell's module mechanism, with one file for each module. Each file starts with a MODULE statement, so let's replace the CONTEXT statement from Ampersand with the MODULE statement. Without any INCLUDE statements, Ampersand compiles the entire file into one information system containing a dataset, a schema, and a set of interfaces. So it compiles a module called ${\tt bar}$ to a triple $\langle D_{\tt bar}, S_{\tt bar}, F_{\tt bar}\rangle$. With an INCLUDE statement, we need to define that every identifier in the included module is known in the including module by the prefix " ${\tt bar.}$ ". To define renaming, need an operator $\downarrow$, just for defining the semantics in the compiler:
${\tt x\downarrow y\ =\ x<>}$ "." ${\tt<>y}$
I will overload this operator to work for information systems, datasets, schemas, interface sets, and their constituent elements as well, meaning that $x\downarrow y$ prefixes the name $x$ together with a dot to every identifier in the namespace of $y$. For example, if $y$ contains the name client
, then $x\downarrow y$ contains the name x.client
on every qualifying occurrence of client
in $y$.
Let ${\tt foo}$ and ${\tt bar}$ be information systems. Each has a dataset, a schema, and some (0...) interfaces. Let $D_{\tt foo}$ and $D_{\tt bar}$ be datasets. Let $S_{\tt foo}$ and $S_{\tt bar}$ be schemas. Let $F_{\tt foo}$ and $F_{\tt bar}$ be sets of interfaces. Now we can define the system ${\tt foo\ INCLUDES\ bar}$ as:
$D_{\tt foo\ INCLUDES\ bar}\ =\ D_{\tt foo}\cup {\tt bar}\downarrow D_{\tt bar}$
$S_{\tt foo\ INCLUDES\ bar}\ =\ S_{\tt foo}\cup {\tt bar}\downarrow S_{\tt bar}$
$F_{\tt foo\ INCLUDES\ bar}\ =\ F_{\tt foo}\cup {\tt bar}\downarrow F_{\tt bar}$
For the datasets, this means that all relation names and concept names in ${\tt bar}$ are prefixed with ${\tt bar}$. Atoms are left alone. In the schema of ${\tt bar}$, all rule names, relation names, concept names, pattern names, and view names are prefixed with ${\tt bar}$. All rule names, relation names, concept names, and interface names from $F_{\tt bar}$are prefixed with ${\tt bar}$.
Surely, name clashes can occur. If, for example, system ${\tt foo}$ contains a name bar.account
and ${\tt bar}$ contains a name account
, the system $D_{\tt foo\ INCLUDES\ bar}$ has a name clash. We will forbid that to ensure a disjoint union semantics.
Alias
In the current implementation, two relation declarations with the same name, source, and target are treated as the same. I don't mind this to remain, but it does not work across the INCLUDE
mechanism (because we forbid name clashes). I propose to do this explicitly with an ALIAS
statement, for example:
ALIAS client, bar.client
This statement presumes that aliases have the same type, or else we get type errors. Needless to say, the ALIAS
statement can also work inside one namespace. It is not linked to the INCLUDE
mechanism. Aliasing works for concepts and relations, but not for other named entities.
Consequences
This mechanism excludes cyclic INCLUDE
-dependencies. I expect the proposed mechanism to meet the requirements of the migration mechanism, but I will leave that to @sjcjoosten to verify. I hope that this include-relation between information systems is transitive. If not, I would like to fix that, so we can draw an include-graph of the system.
If module ${\tt foo}$ includes module ${\tt bar}$, we currently implement both ${\tt foo}$ and ${\tt bar}$ on the same database. For distributed systems, we will have to allow them to be implemented on different databases. I suggest we do that in another issue.