RFCs
RFCs copied to clipboard
Proposal: constructor unboxing
[I've replicated the proposal in this comment for ease of reading.]
Constructor unboxing
Motivating example: compact rope representation
Data structures defined in OCaml are often less compact than they might be, because of boxing.
For example, here is a type for representing ropes:
type rope = Leaf of string
| Branch of { llen: int; l:rope; r:rope }
With this definition the value Branch {llen=3; l=Leaf "abc"; r=Leaf "def"} has the following representation:
[--B|-3-|-∘-|-∘-]
/ \
/ \
/ \
[--L|-∘-] [--L|-∘-]
/ \
/ \
"abc" "def"
In the general case each part of this representation serves a purpose. For example, in order to distinguish Leaf nodes from Branch nodes at run-time, each constructor is represented by a tagged block. However, for this particular data type the block representing Leaf nodes ([--L|-∘-]) is unnecessary; since strings are already distinguishable from other blocks, the value could in principle be represented more compactly:
[--1|--3|-∘-|-∘-]
/ \
/ \
/ \
"abc" "def"
The basic idea
Adding an [@@unboxed] annotation to a variant definition indicates that the representation of unary constructors should not involve an additional block:
type rope = Leaf of string
| Branch of { llen: int; l:rope; r:rope } [@@unboxed]
Only a subset of variant definitions support [@@unboxed]. In particular, it must be possible to distinguish the arguments of unary constructors from each other (and from constant constructors in the same definition) at run-time. For example, the following definition is not allowed, since the arguments of X and Y have the same representation:
type strings = X of string | Y of string [@@unboxed] (* Invalid! *)
Performance improvements
Since the unboxed variant representation uses less allocation and less indirection, it improves performance in some cases.
For example, here is a simple benchmark for the rope data type. The benchmark creates rope representations of size n, converting the ropes to strings in a final step:
let rec build n =
if n = 1 then leaf "a"
else let llen = succ (Random.int (pred n)) in
branch (build llen) (build (n - llen))
in
string_of_rope (build n)
Measurements show substantial improvements for the unboxed representation, especially for larger values of n:
| Size | Boxed (μs) | Unboxed (μs) | Unboxed % |
|---|---|---|---|
| 2^6 | 3.90 | 3.81 | 97.7 |
| 2^8 | 15.99 | 15.38 | 96.2 |
| 2^10 | 64.82 | 62.32 | 96.1 |
| 2^12 | 281.13 | 257.30 | 91.5 |
| 2^14 | 1560.99 | 1220.11 | 78.1 |
| 2^16 | 10089.72 | 5332.93 | 52.9 |
| 2^18 | 50027.06 | 35030.16 | 70.0 |
(These measurements were taken by building the unboxed representation explicitly using Obj rather than actually implementing this proposal in the OCaml compiler.)
Which types are distinguishable?
Eliminating the block associated with a constructor is safe only when:
- the representations of distinct constructors remain distinguishable at run-time
- no type can represent both non-float and float values (to avoid problems with the float array unboxing optimization)
This proposal currently focuses on concrete types; it may be extended to abstract type constructors and existential variables by building on the notion of separability introduced in Unboxing Mutually Recursive Type Definitions in OCaml (Colin, Lepigre, Scherer, JFLA 2019).
Consider the definition of a data type t with unary argument types t1...tn and constant constructors C1...Cn:
type t = C1 | ... | Cn | T1 of t1 | ... Tn of tn
The simplest case where constructor unboxing is clearly safe is where t1...tn all have non-immediate and distinguishable representations, and none of them is either float or t.
However, constructor unboxing is also possible if some of the t1...tn have immediate representations. For example, consider the following definition:
type u = C1 | C2 | C3 of bool | C4 of char
In OCaml C1, C2, bool and char are all represented as immediates, and a single immediate value (i.e. an int) can easily represent all of these values. We can therefore adopt the following representation:
| Value | Representation |
|---|---|
C1 |
0 |
C2 |
1 |
C3 false |
2 |
C3 true |
3 |
C4 c |
4 + Char.code c |
More generally, we can unbox the definition t if:
-
the block-representations of
t1...tnare all disjoint (and distinct fromfloatand fromtfor non-unary constructors) -
the immediate-representations of
t1...tntogether withC1...Cncover less than the immediate space.
(Here block-representation indicates the set of tags that non-immediate values of a type can have, and immediate-representation indicates the number of distinct immediate values of a given type. Covering less than the immediate space means that the sum of the immediate representations of t1...tn is less than max_int.)
How does this relate to the existing [@@unboxed] annotation?
This proposal is a conservative extension of the existing [@@unboxed] annotation (PRs #606, #2188). It has no effect on existing code, and the extended meaning of [@@unboxed] is compatible with the existing meaning.
How does this relate to the existing proposal for unboxed types?
Another open proposal involves unboxing field types into parent types --- for example, producing flat representations of int32 pairs:
type t = { x : #int32; y : #int32 }
The two proposals are distinct, but complementary. The field-unboxing proposal does not support the compact rope representation, the current proposal does not support pair unboxing, and the combination of the two proposals can support representations that are more compact than either proposal supports in isolation. In the following example, the int32 arguments are unboxed into a single block using field unboxing, and the block associated with the Pair constructor is eliminated using constructor unboxing:
type t = { x : #int32; y : #int32 }
type topt = Nopair | Pair of t [@@unboxed]
Without any support for unboxing, the value Pair {x=0l; y=0l} involves 4 blocks; with field unboxing alone it involves 2 blocks; with constructor unboxing alone it involves 3 blocks; with both field and constructor unboxing it involves 1 block.
(There are some additional connections and distinctions between the two proposals. The current proposal can be seen as a generalization of the nullable types mentioned in the field unboxing proposal. And the field unboxing proposal is more ambitious: it additionally introduces compound unboxed types and changes to the type system to combine unboxing with abstraction.)
Extension: partial unboxing
The current proposal requires all unary constructor arguments to have distinguishable representations. It might also be useful to support the case where only some of the arguments are distinguishable by allowing per-constructor unboxing specifications.
Comment: abstraction
Unboxability (for records) is a property of a single type, and is fairly straightforward to abstract.
However, separability is a relation between types, and is not so easily abstractable.
One possibility is to optionally expose the immediate-space and the tag-space of abstract types by extending the [@@immediate] attribute. For example, we might promise that a type has an immediate representation with no more than 256 distinct values:
type t [@@immediate 0..255]
We might additionally support setting the tag value associated with particular constructors explicitly to avoid clashes:
type t = { x: int; y: float} [@@tag 240]
I like this proposal, and these are definitely optimisations I want to be able to express. However, I think that there are two separate optimisations being proposed here which I'll call inlining and disjointness, and I think it's worth considering the two independently, rather than conflating them with the immediate/block distinction as this proposal currently does.
Inlining and disjointness
Inlining combines the constructors of different types into a single sum representation. Given:
type ab = A | B
type cd = C | D
type u = Foo of ab | Bar of cd [@@unboxed]
this proposal inlines the definition of ab and cd into u, giving u the same representation as currently given to:
type u = Foo_A | Foo_B | Bar_C | Bar_D
Inlining amounts to reasoning about sum types modulo associativity, transforming (A + B) + (C + D) into A + B + C + D.
Disjointness represents certain constructors of different types as the identity, when this introduces no ambiguity. Given:
type ef = E of int | F of int * int | G
type t = Stuff of ef * int | Num of float [@@unboxed]
this proposal represents the Num constructor as the identity, on the basis that its values are represented using Double_tag which collides with none of the other possible values. This type is not currently expressible in OCaml, but its values are exactly the values represented by either of these two types:
type t1 = Stuff of ef * int
type t2 = float
Disjointness amounts to transforming A + B into A ∪ B, when A and B do not overlap.
This proposal
If I understand correctly, this proposal currently operates on type definitions type t = C1 of ... | C2 of ... | ... [@@unboxed] by:
- selecting all of the single-argument constructors
Ciwith argumentsai - applying inlining to the immediate values of the types
ai - applying disjointness to the block values of the types
ai
I don't think that this immediate = inlining, block = disjointness split is natural, and it makes two otherwise-orthogonal optimisations less useful. For instance:
- Inlining blocks A slight variant of the type
uabove is:
type ab' = A of int | B of int
type cd' = C of int | D of int
type u' = Foo of ab | Bar of cd [@@unboxed]
The inlining optimisation is just as applicable and just as useful as before. I would like to represent u' as though it were:
type u' = Foo_A of int | Foo_B of int | Bar_C of int | Bar_D of int
However, upon seeing block constructors, the currently proposal switches from inlining to disjointness, which fails as the types ab' and cd' are not disjoint.
(The converse, applying disjointness to immediates, does not appear to be useful. It is very difficult to construct two immediate types that are disjoint, as all immediate types tend to assign a meaning to 0).
- Multi-argument constructors The single-argument restriction is necessary for disjointness but irrelevant for inlining. For instance, we could still inline the types
abandcdbelow:
type v = Two of ab * cd
by representing it as though it were
type v = Two_A_C | Two_A_D | Two_B_C | Two_B_D
(This amounts to reasoning about distributivity of products over sums, as well as associativity of sums)
- Partial unboxing As mentioned in the proposal itself, it would be useful to be able to specify particular constructors to unbox. In particular, this allows unboxing of the type
stringsmentioned above, by unboxing either (but not both) of the arguments.
So, with this in mind, might it not make more sense to have separate annotations for inlining (per-field) and disjointness (per-constructor), rather than a single per-type [@@unboxed] annotation? In particular, I suspect the #t syntax from #10 could be reused for inlining, as it's essentially the same (or dual) optimisation.
Abstraction
The distinction between inlining and disjointness reappears when considering abstraction. Inlining requires no notion of "separability" - whether something can be inlined depends only on its layout, and is exactly as hard as inlining records in the manner proposed in #10. (Well, I imagine the implementation will be more work, as there's the pattern-matching compiler to worry about)
Separability arises only for disjointness optimisations. While a precise analysis of disjointness does require a binary relation between types (which as mentioned above is annoying to abstract), a simple approximation seems to get most of the value of this relation.
We can introduce a new layout (i.e. kind) describing those types whose block values do not lie in the tag space used for datatype constructors. This layout includes types string, float, int32 and so on. The disjointness criterion than then allow at most one constructor with such a type to be represented as the identity.
This is a very coarse approximation. However, it suffices to accept all of the motivating examples in this proposal. (Finer approximations are also possible. For instance, having separate layouts for string and float tags would allow both a string and a float constructor to be simultaneously unboxed via disjointness)
The way I think of it, constructors type foo = ... | Bar of bar denote an embedding of bar into foo that gives distinguishability (from other constructors) -- initiality / an elimination principle / the ability to pattern-match. The syntactic piece of data Bar(...) may be realized in various way in low-level representations; the default to create a new block with a constructor tag chosen for distinguishability from other constructors of t, but any other embedding is acceptable as long as distinguishability is preserved.
Single-constructor unboxing uses exactly identity (of low-level representations) as the embedding functions. This is also what's going on with what Stephen calls "disjointness", but not with "unboxing": with unboxing, the values are changed by the embedding: in the example of Jeremy where booleans are inlined, false, true become 2, 3 (this shifting is the computational content of the embedding).
Embeddings are restricted by the fact that they must preserve identity of mutable state: "inlining" may not be possible if the argument of the constructor to inline is a record with mutable fields.
Here's another example where this would be useful: Z.t values in Zarith are either immediate integers or custom blocks. Both the type itself and (consequently) much of the implementation is currently written in C; with this proposal it'd be possible to express the type definition in OCaml without changing the representation:
type t = Small of int | Large of mpz_t [@@unboxed]
The FFI story is not obvious here. We might want the compiler to be allowed to be as clever as possible as soon as the user ask for unboxing (or any other kind of change in representation). Maybe we should just say that the representation is not fixed in that case and this type is not fit for FFI (or can only be manipulated as an abstract type on the C side). A warning triggering whenever that kind of type can reach a C function would be nice. Such a warning would probably be neither correct nor complete but for obvious cases it could still probably help.
There is also another question that arise with that kind of performance related annotations. How do user discover about that kind of features ? Should there be a mode where the compiler suggest what transformation could be possible ? Something like that would require an non local analysis (you need to annotate some other mlis than the one you are looking at to know that something is possible)
As a last comment, the notion of separability might not be as stable as one could assume. There is some work going on to try to compile OCaml to webassembly. Such a backend has less control on the shape of values and in particular might not have tags for float or string values.
Thinking more about this as well:
- I like the general idea of the proposal, and I think that being able to write
type zarith = Small of int [@@unboxed] | Large of mpz_tis a killer application. - I would prefer if the
[@@unboxed]attribute was per-constructor rather than global. (This was also hinted at by @garrigue.) The meaning of each[@@unboxed]is that this constructor is represented as runtime by a no-op, because the payload is disjoint from all other values of the type. - Reusing @stedolan's distinction between inlining and disjointness, I would concentrate on the "disjointness" part for a minimal version of the RFC, without playing with "inlining" (of a sum type into another sum type) at first.
- When we do inlining, I would prefer if we manually specified the representations of the constructors, instead of letting the compiler perform an implicit transformation. The unboxing step itself would preferably remain the identity, to avoid copy effects.
For example I would not be in favor of either
type cd = C | D of int
type abcd = A | B | CD[@unboxed] of cd
(* Bad: not explicit enough on representations *)
type cd = C | D of int
type abcd = A | B | (CD of cd [@unboxed function C -> 2 | D -> 0])
(* Bad: (function CD x -> x) allocates *)
but I would be happy with either of those clearly-disjoint declarations:
type cd = C | D of int
type abcd = A [@repr 1] | B [@repr 2] | CD [@unboxed] of cd
type cd = C[@repr 3] | D of int
type abcd = A | B | CD [@unboxed] of cd
@chambart : two answer two of your questions
-
I propose to avoid implicit transformations where the compiler "finds a representation such as X", but instead favor explicit annotations that contain, when necessary, annotations on which representation to use. In particular, this clearly specifies a FFI story, just like
[@@unboxed]does: expert users have to specify the representation of OCaml values to use these annotations (no guessing from the compiler), and they can also use the C FFI to manipulate them. -
The whole idea of attributes is that different consumers of the code could interpret them differently. js_of_ocaml or wasm_of_ocaml may not use some of the unboxing attributes, and I think that this is fine. OCaml has a high-level semantics where those attributes are ignored, and (several) low-level semantics where we think about value representation, observing allocations etc., and the latter may differ among realizations of the language.
* I would prefer if the `[@@unboxed]` attribute was per-constructor rather than global. (This was also hinted at by @garrigue.) The meaning of each `[@@unboxed]` is that this constructor is represented as runtime by a no-op, because the payload is disjoint from all other values of the type.
I fully agree with @garrigue and @gasche : the annotation should be per constructor so that the user knows exactly which constructors are omitted, and the compiler never chooses which of two constructors should be omitted.
OCaml has a high-level semantics where those attributes are ignored
Note that this is not entirely the case because the representation needs to be taken into account when checking implementations against interfaces. I think this is still all fine -- it just means that your code needs to obey the rules of the native OCaml representation even when you are actually compiling it to JavaScript.
The whole idea of attributes is that different consumers of the code could interpret them differently. js_of_ocaml or wasm_of_ocaml may not use some of the unboxing attributes, and I think that this is fine. OCaml has a high-level semantics where those attributes are ignored, and (several) low-level semantics where we think about value representation, observing allocations etc., and the latter may differ among realizations of the language.
Sure, but we might have a problem with js_of_ocaml if it needs to have a different interpretation of the attributes than the bytecode compiler (since, well, it is "just" a postprocessing on its output).
In a sense the point is that if a given compiler pass makes choices based on assumptions on runtime value representations, then only OCaml implementations that satisfy those assumptions can safely reuse this compiler pass. Adding low-level features to the surface language that are based on those assumptions means that those assumptions are now made earlier in the pipeline, and they could in general invalidate design choices of some implementations branching from the main compiler after some passes.
In general if the problem arise we always have the option of having a flag to not perform the representation change during compilationoptimization. js_of_ocaml (or wasm_of_ocaml) would then compile using this flag for the lower passes.
For this optimization for js_of_ocaml, I don't foresee a problem. If I understand correctly, js_of_ocaml stores the tag of blocks, uses JS closures for closures (and I expect we can distinguish them from arrays by introspection). It uses javascript "numbers" (float, I presume?) for both integers and floats, but I expect that numbers represent unboxed floats, and that boxed floats still carry their type around. (In any case we don't allow unboxing floats so this should be safe, but the same question could apply to say int versus Int64.t in a runtime that would not box Int64.t.)
One should ask people who knows better about js_of_ocaml (ping @hhugo), but I believe floats are indeed unboxed. And one could imagine (if it's not already the case) that int32 would also be represented with Javascript numbers (so, cannot be distinguished from int and floats, contrary to OCaml).
It's certainly possible to tell ocamlc to not perform the representation optimization when the result is intended to be fed to js_of_ocaml, but then we lose the ability to reuse existing .cma libraries compiled for "normal" use. I'm not saying this is an argument against the optimization (which I like very much), but we need to take that into account (perhaps the conclusion is that js_of_ocaml should introduce its own file suffixes, not reuse .cmo/.cma, and explicitly requires re-compilation; a bit more like Bucklescript, I guess).
perhaps the conclusion is that js_of_ocaml should introduce its own file suffixes, not reuse .cmo/.cma, and explicitly requires re-compilation
This sounds pretty inconvenient for users that have single codebase targeting both native and js (via jsoo), especially when depending on heavy packages like core-kernel. Switch rebuild takes considerable amount of time, and above suggestion will effectively double it.
My understanding is that the part of the build that would need to be duplicated (parsing, type-checking and bytecode compilation) takes only a fraction of the time, compared to either native-code production or jsso optimizations and javascript production. I picked a single module (tool/ocamlprof.ml from the compiler distribution), producing the .cmo takes 0.070s on my machine (0.060s from typing), producing the .cmx takes 0.120s on my machine (again 0.060s from typing), calling jsoo on the .cmo takes 0.140s. If you want a switch that installs jsoo modules for all packages, today building this module would take 0.330s (0.070s + 0.120s + 0.140s), tomorrow the .cmo-production step would be duplicated so it would take 0.400s. This is far from a doubling of compilation time.
Of course this could be reduced further by, say, having an intermediate output for the typedtree, so that all three builds (.cmx, .cmo, .js) could reuse it. Then the build time would go down from 0.330s today to 0.280s.
My understanding is that the part of the build that would need to be duplicated (parsing, type-checking and bytecode compilation) takes only a fraction of the time, compared to either native-code production or jsso optimizations and javascript production.
I would just like to point out that this is not at all accurate on Jane Street's code base. Type checking is comfortably the largest cost in compilation.
I don't think it is particularly relevant to this discussion though, because I think that needing to use different front-end options for js_of_ocaml is both unnecessary and not really a viable suggestion. A better option is to keep the constructors and destructors of these unboxed constructors until later in the compilation pipeline so that exotic back-ends can decide whether to unbox or not based on whether the representations are actually separated on that platform.
A better option is to keep the constructors and destructors of these unboxed constructors until later in the compilation pipeline so that exotic back-ends can decide whether to unbox or not based on whether the representations are actually separated on that platform.
js_of_ocaml currently parses .cmo files; do you mean that we should propagate explicitly the "constructors/destructors" down to the bytecode level? That doesn't sound right. Keeping the information up, to, say, the lambda level would make sense, but then one would need to dump the lambda code and have js_of_ocaml starts from there (as Bucklescript does, IIUC).
This is a bit tangent to the discussion but regarding js_of_ocaml I wouldn't like it to add its own compilation objects. There's already quite a menagerie of compilation objects in the eco-system and starting from bytecode has the following good properties:
- Suppose you have a pure OCaml library you install via opam. The author of that library is not interested in
js_of_ocamland doesn't support it in its build system. Yet you can just install the library and use it in yourjs_of_ocamlproject. No need to go bother that author or fiddle to have your opam switch/packages build in a different way. - I don't know where Bucklescript starts but it's notoriously lagging behind OCaml versions. Starting from bytecode offers a relatively stable interface to the compiler's outputs which seemed to have enabled
js_of_ocamldevs to cope with OCaml development without too much work.
I think that for js_of_ocaml to start from bytecode is a very good call both from a usability and maintenance perspective.
Regarding jsoo: It might be the case that that most of the proposals here are compatible with jsoo, with only a smaller subset of them requiring more thought.
When compiling to JS, Strings will always have some way to be distinguished at runtime. Even if jsoo's string implementation removes the boxing around strings by default (JS engines always provide a type tag to check strings).
Even some of the proposed optimizations around partitioning an integer range (bool/char etc) across a set of variant constructors seems like it wouldn't cause problems (so long as the bytecode includes all of the information for renormalizing their values when they leave their "constructors"). The only thing that seemed like a potential problem for jsoo was any optimization that relies on floats always being reliably tagged at runtime. Right now floats are not, and it does cause some compatibility issues, though for a JS target it is a welcome tradeoff because you don't want to have to pay the price of allocating boxes around floats, on top of the price of the VM's NaNboxing at runtime (which most engines do). Is it perhaps a good idea to avoid any optimizations that rely on the runtime representation of floats - not just for jsoo's sake, but to make it easier to later change the representation of floats in OCaml itself?
On the other hand, if all the other optimizations mentioned are compelling enough, maybe it would be worthwhile to add boxing around floats in jsoo in order to get them. It would solve some of the compatibility issues that jsoo currently has with floats (unmarshaling). For perf sensitive applications, applications could explicitly use an unboxed Js.float.
Either way, it would be nice if jsoo could take advantage of all of the optimizations that do easily apply to jsoo's compilation approach. Does it need to be all or nothing?
Regarding jsoo-specific compiler artifacts: I think this has the potential to cause compatibility, reliability, or even fragmentation issues within the ocaml ecosystem. jsoo's strength is that it is ocaml ecosystem compatible, and unless you use native C bindings, your packages more or less work well when compiled with jsoo. Importantly, packages that you depend on don't need to anticipate being compiled with jsoo, or do anything special. They don't even need test the jsoo workflow.
Note: currently choices based on the representation (or not) of constructors are made during the translation from Typedtree to Lambda, in particular during pattern-matching compilation. If we wanted to keep the Lambda representation-agnostic, we would need to use higher-level makeblock and switch constructs (with constructors instead of constants and tag values), which would only be lowered later (as we do for string switches currently). This is possible, but a sizeable refactoring, sensibly more work I suspect than implementing the corresponding part of the proposal.
@gasche and me discussed about https://github.com/coq/coq/pull/12733 and found that this RFC would be an elegant solution to this problem. Essentially, they would need an ability to discriminate over the closure tag in an OCaml pattern matching (Obj.tag is too slow for this application, so this is not an option here). (A small extension of) This RFC makes this possible by allowing to unbox functions in an ADT. (A subtlety is that if we allow to unbox functions, then the corresponding constructor would correspond to two tags --the closure tag and the infix tag--, but that does not seem to be really complicated to handle.)
The way the Coq folks solved this problem in the past is by changing the tag of the closure to 0 so that this special kind of closures (accumulators) had the same tag as a special ADT constructor which was easy to discriminate using a pattern matching. However, this is not compatible with the new closure representation in no naked pointers mode, hence they need a new solution.
As far as I understand, for this proposal to be useful for Coq, the unboxed attribute would need to be per constructor. Also, this is not just an optimization in the case of Coq, as the program would instantly crash (hopefully), if the compiler decided to ignore the attribute and put an indirection. That said, ultimately, the tag attribute is much closer to the semantics Coq expects.
Currently:
type ind_foo =
| Accu_foo of t (* this constructor is a lie, just to make sure that tag 0 is free for storing a closure *)
| Construct_foo_0 of t * t * ...
| Construct_foo_1 of t * t * ...
| Int_foo_0
| Int_foo_1
With unboxed:
type ind_foo =
| Accu_foo of t -> t [@@unboxed] (* the actual type of the closure is infinite: t -> t -> t -> ... *)
| Construct_foo_0 of t * t * ...
| Construct_foo_1 of t * t * ...
| Int_foo_0
| Int_foo_1
With tag:
type ind_foo =
| Accu_foo of t [@@tag 247] (* the type is still a lie, but the intent is clear *)
| Construct_foo_0 of t * t * ...
| Construct_foo_1 of t * t * ...
| Int_foo_0
| Int_foo_1
As far as I understand, for this proposal to be useful for Coq, the unboxed attribute would need to be per constructor.
Indeed, you would need some way to do a fine-grained control of which constructor is unboxed. Perhaps this could be done by choosing well the content of the other constructors?
That said, ultimately, the tag attribute is much closer to the semantics Coq expects.
@silene : if such a tag attribute were implemented, then it is very unlikely that the closure tag will be allowed at this place, since the GC expects some particular memory layout when it sees the closure tag.
| Accu_foo of t -> t [@@unboxed] (* the actual type of the closure is infinite: t -> t -> t -> ... *)
If t -> t -> t -> ... is really your intent for the inner type of the constructor, you can still define type u = (t -> u) with -rectypes But note that this is (again) a lie since the accumulator can be used as a constructor even in the future.
Indeed, you would need some way to do a fine-grained control of which constructor is unboxed. Perhaps this could be done by choosing well the content of the other constructors?
None of the other constructors takes a function as an argument, so I guess this is fine. In the end, the compiler can unbox any constructor it wants, as long as Accu_foo is guaranteed to be unboxed and a constructor Constructor_foo of t is guaranteed not to be. The latter could be avoided by turning t into t * t, but that would have a non-negligible memory footprint.
if such a tag attribute were implemented, then it is very unlikely that the closure tag will be allowed at this place, since the GC expects some particular memory layout when it sees the closure tag.
That is the whole point, isn't? Coq is being mentioned in this discussion precisely because OCaml's GC will soon expect some specific layout for (non-)closuresclosures, so Coq can no longer create closures with tag 0.
But if you are concerned with safety, the solution would be for the compiler to forbid the use of Accu_foo for creating values or anything other than a wildcard for pattern-matching. Note that this is perfectly fine with Coq, since the only occurrence of the constructor is in the construct match Obj.magic x with Accu_foo _ -> ... | ... -> ....
If
t -> t -> t -> ...is really your intent for the inner type of the constructor,
No, I just wanted to make it clear for the readers that t -> t is an approximation of the type, since the closure representing an accumulator accepts an arbitrary large number of arguments. From the point of view of enabling unboxing, any type that looks like a function, e.g., t -> t, should be fine.
To summarize my comment at https://github.com/coq/coq/pull/12733#issuecomment-674741700 : perhaps a fast is_closure test suffices to get OK performance; I would need evidence that it doesn't suffice before embarking on the [@@tag 247] approach.
if such a tag attribute were implemented, then it is very unlikely that the closure tag will be allowed at this place, since the GC expects some particular memory layout when it sees the closure tag.
That is the whole point, isn't? Coq is being mentioned in this discussion precisely because OCaml's GC will soon expect some specific layout for (non-)closuresclosures, so Coq can no longer create closures with tag 0.
But if you are concerned with safety, the solution would be for the compiler to forbid the use of Accu_foo for creating values or anything other than a wildcard for pattern-matching. Note that this is perfectly fine with Coq, since the only occurrence of the constructor is in the construct match Obj.magic x with Accu_foo _ -> ... | ... -> ....
Yes, that's the whole point of this RFC, but, in this RFC, in addition to providing more control over memory layout, we don't want to loose any safety guarantee. If the feature is unsafe or requires using Obj either to create or use values, then I don't think this is satisfying. As far as I understand your tag attribute proposal, it would require using Obj to create values which are compatible with the GC requirements.
Hi, I do not know if it perfectly fits in this RFC, but I think it is at least related.
I would appreciate being able to specify that a constructor argument should be "mixed/inlined" within the constructor itself.
So for instance, in the case of simple expression:
type op = Add | Mul
type t = Cst of int | Neg of t | Op of (op [@product]) * t * t
The annotation [@product] will make the cartesian product of Op and op:
type t = Cst of int | Neg of t | OpAdd of t * t | OpMul of t * t
Thus, matching on "high level" Op will simply be the or patter of every derived constructor. Extracting the value Add from OpAdd should be trivial since it would be a simple affine transformation of the tag.
I think it could also work if the mixed type contains non constant constructor but I am not sure if it would be interesting (extracting the value back will need a fresh allocation with copy). May be should it apply only on the constant constructors of the type.