Externally-implemented modules for Catala
The problem
In this issue, we assume that Catala has a module system as described in #88. The problem we want to solve is exemplified by #81: we want for Catala programs to rely on abstract scopes and types which cannot be defined or implemented within the limits of the DSL.
Design of an external implementations systems
Syntax
To simplify the implementation, we only allow for whole modules to be externally implemented at a time. Externally implemented modules have to be define in standard .catala* files, except that the first line with the module name should bear:
> Module Foo external
While types and scopes have to be declared in such a file, it is impossible to write any scope usage that might introduce rules or definitions. Modules that use Foo can then use any of the externally implemented types and scopes defined in Foo.
Compilation
When compiling an externally implemented module to a backend like OCaml, the Catala compiler shall create a header file like *.mli or *.h. It is then up to the user to provide a manually-written implementation in the target language.
Intererpreter
See below
Filling out the "Interpreter" part after the design discussion for this issue.
There are essentially two steps to this process. The first one is to i) go over the types introduced by the external .catala* file, then ii) go over the externally-written definitions so as to "make them available" to the Catala interpreter located within the Catala compiler.
The second step is to embed/de-embed interpreter values and pass them to / receive them from externally-implemented definitions.
External Catala modules written in OCaml, via the interpreter
A first way to go about this is to rely on compiler-libs to perform on-the-fly compilation of the external OCaml files to bytecode, then use the resulting compiled functions from the interpreter. In greater detail:
- Catala processes
foo.catala_enwhich is an external module - Catala emits
foo_types.mliwhich contains only type declarations (although maybe standalone mlis are deprecated, I can't remember precisely) - Catala locates
foo_impl.mlwritten by the user - Catala starts an OCaml interpreter via compiler-libs, processes
foo_types.mlithenfoo_impl.ml, then the user-written functions become available via the OCaml interpreter's environment and can be called from Catala's interpreter
External Catala modules written in OCaml, via compilation and dynlink
To avoid relying on an unstable API such as compiler-libs, one can use instead the regular ocaml compiler... which results in a somewhat heavier process.
Steps 1.-3. remain the same. Then:
4. Catala compiles foo_types.mli to foo_types.cmi; foo_impl.ml tofoo_impl.cmxs using -shared
5. Catala dynlinks foo_impl.cmxs then obtains function pointers to the user-implemented external primitives
The drawbacks are that now, Catala needs to i) locate the "right" ocaml compiler that it was compiled with, ii) fork external commands, iii) perform minimal build system management to make sure the compilation only happens if the mtime of the hand-written files are different, etc. -- there's also a higher risk of segfaults if one updates the OCaml version, but fails to recompile Catala, etc.
Embeddings and de-embeddings
The idea is that within the interpreter, Catala terms are represented as:
type expr =
| IntLit of int
| Record of expr list
Going from the interpreter to a natively-compiled function, a function deembed goes from the deep embedding to a native representation of objects; it has type expr -> Obj.t. Sketch:
let deembed = function
| IntLit x -> Obj.obj x
| Record es -> let r = Obj.create_block (List.length es) in List.iteri (fun i e -> Obj.set_field r i (deembed e)); r
Going from the result of a natively-compiled function, a function embed goes from the native representation to a deeply embedded representation; it has type Obj.t -> typ -> expr where the typ argument is a the interpreter's representation of the external function's return type:
let embed o = function
| Int -> IntLit (Obj.repr o <: int)
| Record ts -> Record (List.mapi (fun i t -> embed (Obj.field o i) t) ts)
I'm not sure this function can be written safely, unless a GADT is introduced to represent the type-safe conversions.
Note : PR #103 already implements part of the embedding/deembedding problem, by providing a sum type runtime_value that already models all the Catala runtime values we want to embed for interoperability with the interpreter.