RFCs
RFCs copied to clipboard
Tuple projections
TL;DR: Adds labeled and (0-indexed) unlabeled tuple projections:
# let x = (~tuple:42, ~proj:1337, "is", 'c', 00, 1);;
val x : (tuple: int * proj:int * string * char * int * int) =
(~tuple:42, ~proj:1337, "is", 'c', 0, 1)
# x.tuple;;
- : int = 42
# x.3;;
- : char = 'c'
As a whole I'm strongly in favor of this feature because I consider that it would increase the quality of life when handling tuples.
I also find the part about row-polymorphic tuples to be extremely interesting as it actually increases the expressiveness of OCaml's polymorphism. However I suspect this would require tweaking a few things in the back-end but hopefully not much.
My personal impression is that row-polymorphic tuples are the most interesting part of the proposal, which is still useful independently of the rest of proposal and interact quite strongly with the design of projections.
Thus I have the impression that it could make sense to focus on that part first, possibly in a separate RFC?.
The discussion of row-polymorphic tuples in this RFC is meant only as a consideration of an alternative typing approach for tuple projections, originally suggested in https://github.com/ocaml/ocaml/pull/14257#issuecomment-3336583144. This RFC does not propose introducing row-polymorphic tuples for unlabelled tuple projections; it discusses them solely to ensure that this RFC's proposed type-based disambiguation approach to typing projections remains compatible with that possible future direction.
If there is a consensus among maintainers that row-polymorphic tuples are desirable, I agree that they should be explored in a separate RFC. However, they are not part of the present proposal and remain outside the scope of this RFC.
Hey 👋 Following up to see if there's anything I need to do on my side. If not, what are the next steps to move this RFC forward?
We had a meeting of OCaml maintainers today, and we discussed your RFC. The discussion was lively and rather subtle, but we reached the following consensus (not everyone agrees with everything below, but we felt that there was a general agreement to follow this proposal):
- some maintainers are not convinced by tuple projections if the type has to be known (enough to block adding the feature if designed in this way)
- but we would welcome tuple projections if we had tuple polymorphism:
fun x -> x.1 : ('a * 'b * ..) -> 'bin particular this could give a more expressive type tofstandsnd, and this is really convincing. - labelled projections when the type is known is also a welcome feature
(I tried to call tuples like 'a * 'b * ... "flexible tuples", in reference to SML's flexible records, but the name did not catch on in the discussion and people talk about "tuple polymorphism" instead. I would still use "flexible tuples" but oh well.)
This consensus is in fact remarkably similar to the opinion expressed by @garrigue on the PR, but it has the benefit of coming with a set of design parameters for which we collectively agree to add the feature to the language, if you are willing to implement it.
Thanks for the update! 🙏
A few thoughts from my side:
- I'm still not convinced by the 'tuple polymorphism' direction. It's a large change to the typer, and it opens questions I'm not particularly keen to answer -- e.g.
x.99999happily generating an arbitrarily large tuple type - I do see the appeal of giving
fst/sndmore general types today, but if we eventually add something like first-class projection functions (e.g..0or_.0forfst, as in https://github.com/ocaml/ocaml/pull/11407), that benefit disappears. - The real issue with the current proposal (and implementation) is its dependence on known type information, and OCaml is simply not very good at propagating such information. Rather than adding another mechanism for row-like polymorphism, I'd prefer to improve the propagation of known type information[^1]. This would put the proposed approach on par (and then some!) with how SML-style projections behave (detailed here[^1]), and would benefit other features that currently rely on type-based disambiguation / known type information.
If nothing in my response is able to sway the maintainers, then I will accept that we have consensus around labelled tuple projections and adjust my implementation to support that feature only.
[^1]: Shameless self plug: here is an approach that would improve on OCaml's current approach
My understanding of the majority opinion that (numeric) tuple projections should not require type information is as follows: in everyone's mental model of OCaml untyped semantics, tuple access x.2 is a parametric operation that works exactly the same on all tuples of at least a given size. So people don't like the idea that let fst x = x.0;; would be rejected, because their intuition tells them that x.0 is not an ambiguous operation, it always means the same thing. (And they can easily think of a way to give it a most general type, using flexible types.)
In contrast, mental models of untyped semantics for labelled tuples differ. If you view them as associative lists, then the same argument would apply (but in that model tuple access has a linear access time...). But many people see labelled tuples as represented by non-labelled tuples at runtime (in their mental untyped semantics), with an ordering between labels decided at type-checking time (in other words, x.foo "desugars" into say x.3 at type-checking time) -- and thus constant-time tuple access. With this mental model they are not surprised that x.foo is rejected in absence of type information, it is a limitation of the compilation model that they have internalized.
The fact that a significant portion of maintainers have this view (that unlabelled tuple access should be treated polymorphically, and that failing in type-undetermined setting is surprising and counter-intuitive) suggests that it would also be widespread among users.
I think it would be reasonable for you to work on labelled tuple projections only. Note that while people were, in general, enthusiastic about the idea of having flexible tuples and numeric tuple indexing, there was less of an enthusiasm about labelled projections only (I asked during the meeting whether people are convinced that it will make a difference in practice in how we write programs, and in general people were not convinced that it would). My sense is that people are okay with that feature in isolation, but they don't consider it very important.
My understanding of the majority opinion that (numeric) tuple projections should not require type information is as follows: in everyone's mental model of OCaml untyped semantics, tuple access
x.2is a parametric operation that works exactly the same on all tuples of at least a given size. So people don't like the idea thatlet fst x = x.0;;would be rejected, because their intuition tells them thatx.0is not an ambiguous operation, it always means the same thing. (And they can easily think of a way to give it a most general type, using flexible types.)
I'm not convinced that the untyped-semantics intuition is the right basis for deciding what the type checker should accept. If it were, then fun f -> f f would be unambiguous: application is just as parametric as tuple access in UTLC. Yet we all know that there are many different (and non-trivial) ways to try to assign a type to f f (equi-recursive types, first-class polymorphism to name a few), and OCaml sensibly rejects them today without explicit annotations (or feature flags).
When we consider a typed language, the complexity of inference (both in terms of time/space and engineering effort, and in some cases decidability) becomes part of the design space. As such the "it works in the untyped mental model" doesn't, by itself, carry much weight as an argument for typability.
In contrast, mental models of untyped semantics for labelled tuples differ. If you view them as associative lists, then the same argument would apply (but in that model tuple access has a linear access time...). But many people see labelled tuples as represented by non-labelled tuples at runtime (in their mental untyped semantics), with an ordering between labels decided at type-checking time (in other words,
x.foo"desugars" into sayx.3at type-checking time) -- and thus constant-time tuple access. With this mental model they are not surprised thatx.foois rejected in absence of type information, it is a limitation of the compilation model that they have internalized.
I agree with you here: labelled tuples are nothing but a type-directed compilation strategy for unlabelled tuples.
The fact that a significant portion of maintainers have this view (that unlabelled tuple access should be treated polymorphically, and that failing in type-undetermined setting is surprising and counter-intuitive) suggests that it would also be widespread among users.
I think that the inductive step here is ill-founded. To my knowledge (and correct me if I'm wrong), no mainstream functional programming language implements tuple polymorphism. SML has structural records, but failed to typecheck fun fst x = #0 x. To me, the absence of tuple polymorphism across existing designs suggests that the feature may not be as natural as you suggest.
Ultimately, my hesitancy around tuple polymorphism stems from complexity: implementing tuple polymorphism looks like a fairly large and relatively invasive to an already complex type checker. And for what is, in practice, a modest quality-of-life improvement; I'm not convinced that this trade-off is worth it. By contrast, the current change to the type checker is self-contained and relatively maintainable. The main practical downside (it's reliance on known type information) is something we could address more directly (with benefit to other more well-used features).
You of course should not feel pressured to implement flexible tuples; but the consensus decision is that we will not add unlabeled/numeric tuple projections without them.
I don't have a strong opinion about tuple projections, but I thought it might be useful to explain why we didn't do something like tuple polymorphism in the initial labeled tuples implementation. Mainly, the reason is that I don't see how to implement it well. I'll expand:
Let's consider snd:
val snd : ('a * 'b * ...) -> 'b
What does this function do at runtime? In particular, how does it locate the field it wants to project in the block?
I think there are three possible answers. The first is that there is some kind of dynamic lookup to map labels and unlabeled indexes to dynamic positions in the block. At that point tuples are basically objects, and I think I would be sad to lose the cheaper anonymous product representation.
The second is that there must be a canonical way to locate the second unlabeled field. Maybe the labeled fields come after all the unlabeled fields? But how do we then compile projection of a labeled field - how do we know how many unlabeled fields come before it?
The answer in SML is to reject such functions. That is, fun snd x = x#2 yields this error:
Elaboration failed: Unresolved record type: Multiple non-agreeing types possible.
The third is that the ... can't be filled in with labeled fields and we go with some canonical ordering. I think this limitation on the polymorphism is not ideal, and would also be a bit sad about a canonical ordering of labeled fields - users might want different orderings for cache reasons (and there are additional reasons to care about the order in OxCaml, where we have fields that are less than a word wide we may be able to pack tightly).
Possibly I'm missing some nice compilation strategy your discussion covered! I'd love to hear more about it.
Good question!
One possible answer is that ('a * 'b * ..) can be completed with labelled tuples, but that the first two elements of the tuple must be non-labelled: this can be instantiated into (int * bool * bar:char) but not (int * bar:char * bool).
The downside of this is that it implies that ('a * 'b * ..) -> 'b is not in fact the principal type of .1, so there is an "inference cliff" depending on whether the type is known (and thus can have labelled fields before the projected element) or not (and thus it cannot). Note that we already have an inference cliff with .foo under the proposal, which can project out of a labelled tuple if the type is known, but will fallback to the last record field in scope if it is not.
It would also be possible to always interpret .1 as the second field, labelled or not, and thus give a weird polymorphic type ('f:'a * 'g:'b * ...) -> 'b where 'f: can be instantiated into an optional label. I don't think that people would appreciate such types however, the argument in favor of flexible/polymorphic tuple was that the inferred types look fairly natural, but this one clearly does not.
We could introduce some row-like syntax for the head of the tuple e.g. infer (.. * 1:'a * ..) -> 'a for .1
You could infer something similar for labelled tuples e.g. (.. * foo:'a * ..) -> 'a for .foo, but with the additional condition that the row 'head' must be instantiated to a concrete row in order to compute the offset for foo.
Something like this would also provide nicer typing for labelled tuple patterns since the check that the row head is instantiated to a concrete row could be delayed until generalization.