stanc3
stanc3 copied to clipboard
Modularize compiler around the idea of a "standard library" using functors
This is something I played around with on a slow day last week. I think it's an interesting idea, but not one I'm myself sold on. I'm opening it up to get feedback/discussion if we want to move in this direction.
This idea came from a discussion with @mandel and @gbdrt at ProbProg 2021. They had been working on an effort to translate Stan programs to Pyro and NumPyro at https://github.com/deepppl, and I asked them what about the current implementation of stanc made it easier or harder to do this kind of work. One thing that came up was the reliance on the Stan math library signatures.
Current implementation
At the moment, we have a file src/middle/Stan_math_signatures.ml which defines a lookup table of all the C++ functions we'd like to expose at the language level. We've essentially been treating these as built in to Stan, but I'd argue this is actually very backend specific. The fact that all these functions need to be defined for your backend to work is part of what stalled the Tensorflow backend, and it created a considerable effort for the deepppl folks, who defined a name-matching Python library: https://github.com/deepppl/stan-num-pyro/blob/main/stan-pyro/stanpyro/stanlib.py. For functions which don't exist or they don't support, they defined dummy signatures which throw a runtime exception.
If we're serious about stanc being architected to support alternative backends, doing something better here is pretty key. I would argue (and I think so would the name of the file) that, as a starting point, Stan_math_signatures belongs in the src/stan_math_backend folder, not the middle folder.
The changes
This PR is conceptually pretty simple. It defines a basic module interface for what we consider to be a "Stan standard library". This had previously been discussed in the review of #1115. The interface looks a lot like the Stan_math_signatures file, but genericified just a touch.
Here it is in full:
module type Library = sig
(** This module is used as a parameter for many functors which
rely on information about a backend-specific Stan library. *)
val function_signatures : (string, signature list) Hashtbl.t
(** Mapping from names to signature(s) of functions *)
val variadic_signatures : (string, variadic_signature) Hashtbl.t
(** Mapping from names to description of a variadic function.
Note that these function names cannot be overloaded, and usually require
customized code-gen in the backend.
*)
val distribution_families : string list
val is_stdlib_function_name : string -> bool
(** Equivalent to [Hashtbl.mem function_signatures s]*)
val get_signatures : string -> signature list
(** Equivalent to [Hashtbl.find_multi function_signatures s]*)
val get_operator_signatures : Operator.t -> signature list
val get_assignment_operator_signatures : Operator.t -> signature list
val is_not_overloadable : string -> bool
val is_variadic_function_name : string -> bool
val is_special_function_name : string -> bool
val special_function_returntype : string -> UnsizedType.returntype option
val check_special_fn :
is_cond_dist:bool
-> Location_span.t
-> Environment.originblock
-> Environment.t
-> Ast.identifier
-> Ast.typed_expression list
-> Ast.typed_expression
(** This function is responsible for typechecking varadic function
calls. It needs to live in the Library since this is usually
bespoke per-function. *)
val operator_to_function_names : Operator.t -> string list
val string_operator_to_function_name : string -> string
val deprecated_distributions : deprecation_info String.Map.t
val deprecated_functions : deprecation_info String.Map.t
end
Then, every module which currently depends on Stan_math_signatures was made into a functor over this module. This means that rather than there being a module called Typechecker, there is now a functor called Typechecking, which when you "call" it with an instance of a standard library module (the above type), it will give you back a typechecker module which works like the old one.
I tried to make as few functors as possible, so some modules just got a slight refactor (e.g. SignatureMismatch was being supplied the list of signatures for everything except operators, which it was checking from the library. Now it just gets supplied them for everything, so it doesn't need to depend on the library implemenation).
At the call site, this looks basically the same as the non-functor status quo. E.g., in stanc.ml, we add the line module Typechecker = Typechecking.Make (Stan_math_library) and then proceed as before by calling Typechecker.check_prog etc.
If anyone is wondering how to think about this, you can conceptually imagine Library as a Java interface, and functors like Typechecking as public class Typechecker<T extends Library>. This is not a good analogy for the complete ML module system, but for the specific use here of dependency injection it is more or less correct.
The Good
- Generic over backends. A project like deepppl could promote the errors they throw from runtime errors to compile time typechecking errors.
- Makes backend-specificity extremely apparent. If you had asked me last week "Does the AST to MIR translation depend on anything about the backend?", I would have answered "I sure hope it doesn't".
But it does! It turns out the lowering of
~statements is dependent on which exact signatures are defined! This refactor makes things like that extremely obvious, since the things that depend on the library look structurally different than those that don't (e.g. they're functors rather than normal modules).
The Bad
- Diff size. This had to touch a lot of files, but mainly what it was doing was adding one level of indent to them to account for the new
struct ... endsyntax required to define a functor. - Fussy syntax and extra signatures. OCaml functors are syntactically heavy to define, even simple ones like this. They also require extra signatures, often placed in their own
..._intf.mlfile (see #358 and discussion)
The Unsure-How-To-Feel
- Increased indirecton. This is good and bad in a way. Abstraction is what allows all the good things listed above, but does increase the learning curve for the compiler and makes it a bit harder to navigate, e.g. in the typechecker you can no longer right click in your editor and go to the place that the standard library is defined, since it could be any number of possible implementations of the
Librarytype. - So far I've been imagining each backend defining different sets of signatures but them all being subsets of the biggest/most developed backend (e.g. C++). This would mean there is always some interface which could compile every
.stanfile. However, there is nothing stopping a backend from defining a function which does not exist in any other backend, making them disjoint. This could be a pro or con, I think it is subjective.
Submission Checklist
- [x] Run unit tests
- Documentation
- [x] OR, no user-facing changes were made
Release notes
Refactored internals of the compiler to be less tied to the specific functions exposed in the Stan Math C++ library.
Copyright and Licensing
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)
If you had asked me last week "Does the AST to MIR translation depend on anything about the backend?", I would have answered "I sure hope it doesn't".
It'd certainly be nice if it didn't. I don't have a good sense of the weight of the downsides here, though.
Functors are pretty common in OCaml (e.g., they're how one would implement any of the common generic data structures like Set, Map, etc) but they can still take some time to get used to.
I've also been trying to use an alternative feature called "virtual libraries" which covers similar use cases while being much simpler (no module applications, etc), but I am not sure if it is possible to do it in this way due to some issues that arise with circular dependencies which functors cleverly avoid.
Scratch that, I believe I have it working with the much lighter-weight virtual library approach. This does lose the benefit that things that use the library look structurally different, but that is the very thing that makes functors "heavy" syntactically
I'm going to close this in favor of #1184
I've re-opened this as an alternative to #1184 after today's language meeting. I want to do another pass on it of cleanup still, which I did do in #1184 after this
Codecov Report
Merging #1182 (3823560) into master (4d6664c) will decrease coverage by
0.12%. The diff coverage is89.92%.
@@ Coverage Diff @@
## master #1182 +/- ##
==========================================
- Coverage 88.79% 88.67% -0.12%
==========================================
Files 64 68 +4
Lines 9844 9970 +126
==========================================
+ Hits 8741 8841 +100
- Misses 1103 1129 +26
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...c/analysis_and_optimization/Dependence_analysis.ml | 100.00% <ø> (ø) |
|
| ...rc/analysis_and_optimization/Monotone_framework.ml | 91.20% <0.00%> (ø) |
|
| ...alysis_and_optimization/Monotone_framework_intf.ml | 100.00% <ø> (ø) |
|
| src/analysis_and_optimization/Optimize.ml | 92.77% <ø> (+0.12%) |
:arrow_up: |
| src/analysis_and_optimization/Pedantic_analysis.ml | 93.42% <ø> (ø) |
|
| src/frontend/Std_library_utils.ml | 50.00% <50.00%> (ø) |
|
| src/stan_math_backend/Stan_math_library.ml | 97.34% <84.12%> (ø) |
|
| src/analysis_and_optimization/Memory_patterns.ml | 87.19% <86.59%> (-0.57%) |
:arrow_down: |
| src/frontend/Info.ml | 89.23% <87.87%> (+0.16%) |
:arrow_up: |
| ...rc/analysis_and_optimization/Partial_evaluation.ml | 89.61% <89.61%> (ø) |
|
| ... and 13 more |