catala Externally-implemented modules for Catala

The problem

In this issue, we assume that Catala has a module system as described in #88. The problem we want to solve is exemplified by #81: we want for Catala programs to rely on abstract scopes and types which cannot be defined or implemented within the limits of the DSL.

Design of an external implementations systems

Syntax

To simplify the implementation, we only allow for whole modules to be externally implemented at a time. Externally implemented modules have to be define in standard .catala* files, except that the first line with the module name should bear:

> Module Foo external

While types and scopes have to be declared in such a file, it is impossible to write any scope usage that might introduce rules or definitions. Modules that use Foo can then use any of the externally implemented types and scopes defined in Foo.

Compilation

When compiling an externally implemented module to a backend like OCaml, the Catala compiler shall create a header file like *.mli or *.h. It is then up to the user to provide a manually-written implementation in the target language.

Intererpreter

See below

Mar 10 '21 21:03 denismerigoux

Filling out the "Interpreter" part after the design discussion for this issue.

There are essentially two steps to this process. The first one is to i) go over the types introduced by the external .catala* file, then ii) go over the externally-written definitions so as to "make them available" to the Catala interpreter located within the Catala compiler.

The second step is to embed/de-embed interpreter values and pass them to / receive them from externally-implemented definitions.

External Catala modules written in OCaml, via the interpreter

A first way to go about this is to rely on compiler-libs to perform on-the-fly compilation of the external OCaml files to bytecode, then use the resulting compiled functions from the interpreter. In greater detail:

Catala processes foo.catala_en which is an external module
Catala emits foo_types.mli which contains only type declarations (although maybe standalone mlis are deprecated, I can't remember precisely)
Catala locates foo_impl.ml written by the user
Catala starts an OCaml interpreter via compiler-libs, processes foo_types.mli then foo_impl.ml, then the user-written functions become available via the OCaml interpreter's environment and can be called from Catala's interpreter

External Catala modules written in OCaml, via compilation and dynlink

To avoid relying on an unstable API such as compiler-libs, one can use instead the regular ocaml compiler... which results in a somewhat heavier process.

Steps 1.-3. remain the same. Then: 4. Catala compiles foo_types.mli to foo_types.cmi; foo_impl.ml tofoo_impl.cmxs using -shared 5. Catala dynlinks foo_impl.cmxs then obtains function pointers to the user-implemented external primitives

The drawbacks are that now, Catala needs to i) locate the "right" ocaml compiler that it was compiled with, ii) fork external commands, iii) perform minimal build system management to make sure the compilation only happens if the mtime of the hand-written files are different, etc. -- there's also a higher risk of segfaults if one updates the OCaml version, but fails to recompile Catala, etc.

Embeddings and de-embeddings

The idea is that within the interpreter, Catala terms are represented as:

type expr =
  | IntLit of int
  | Record of expr list

Going from the interpreter to a natively-compiled function, a function deembed goes from the deep embedding to a native representation of objects; it has type expr -> Obj.t. Sketch:

let deembed = function
  | IntLit x -> Obj.obj x
  | Record es -> let r = Obj.create_block (List.length es) in List.iteri (fun i e -> Obj.set_field r i (deembed e)); r

Going from the result of a natively-compiled function, a function embed goes from the native representation to a deeply embedded representation; it has type Obj.t -> typ -> expr where the typ argument is a the interpreter's representation of the external function's return type:

let embed o = function
  | Int -> IntLit (Obj.repr o <: int)
  | Record ts -> Record (List.mapi (fun i t -> embed (Obj.field o i) t) ts)

I'm not sure this function can be written safely, unless a GADT is introduced to represent the type-safe conversions.

Mar 15 '21 20:03 msprotz

Note : PR #103 already implements part of the embedding/deembedding problem, by providing a sum type runtime_value that already models all the Catala runtime values we want to embed for interoperability with the interpreter.

Apr 09 '21 08:04 denismerigoux