mojo
mojo copied to clipboard
[Feature Request] Named output arguments for explicit NRVO (and a new syntax for output parameters)
Review Mojo's priorities
- [X] I have read the roadmap and priorities and I believe this request falls within the priorities.
What is your request?
Mojo is a performance-focused language, so it's valuable to eliminate unnecessary copies and moves. Furthermore, some types are not movable at all. Accordingly, it would be great if Mojo offered a way for function implementers to directly initialize a return slot provided by the caller.
In some languages (such as C#), this is achieved using out
arguments, but these are awkward to work with, because they require callers to provide an Lvalue to hold the result.
Here is my proposed alternative:
fn foo() -> (result: Bar):
result = Bar(...)
do_something(result)
return result
let x = foo()
Above, the return slot has been given an explicit name result
. When execution begins, it is uninitialized, just like an out
argument. It must be initialized before the function returns. Otherwise, the compiler should emit an error.
With this syntax, Mojo can achieve what is known as "guaranteed NRVO" in the C++ community, without requiring any compiler magic.
We can also support multiple named return slots:
fn foo() -> (x: Bar, y: Bar):
This syntax is different from Python's syntax for tuple types. Those are written as tuple[Bar, Bar]
.
Design alternatives
Because the return slot has been explicitly named, it's not strictly necessary for the return
statement to include the variable name. In fact, it might even be a little bit misleading, because it suggests that it would be valid to change the variable being returned, e.g. to change return result
to return blah
. If this was allowed, it would be equivalent to writing result = blah; return result
. I'm not sure whether this should be supported. It probably shouldn't be, because the whole purpose of NRVO is to avoid copying/moving, and return blah
would need to do an implicit copy.
Accordingly, perhaps if the return slot has been named, return statements must not be followed by an expression, i.e. one must simply write return
, or omit the statement entirely if an early return is not needed. This syntax would be consistent with how inout
works (and how out
works in C#): you smuggle data out of the function by assigning to the variable, rather than via a return statement.
If a function raises an exception, then any value that has been assigned to a return slot should be destroyed and discarded.
Constructors
Currently, Mojo constructors accept an inout self
argument. This is handled with "compiler magic", because at the beginning of a call, self
is actually uninitialized. (If Mojo had out
arguments, we would be writing out self
.) Also, the type signatures of today's constructors are inconsistent with the syntax for invoking constructors. The signatures suggest that constructors are instance methods, even though they are not invoked using instance.method()
syntax. (In Python, when you invoke a constructor, you are actually invoking __new__
, which partially initializes an object and then hands it off to __init__
.)
Named return slots can resolve both of these issues:
struct Foo:
var x: Int32
fn __init__() -> (self):
self.x = 0
I think it would be worthwhile making the parentheses around self
mandatory, for consistency with the general case where a type annotation is required. Parentheses prevent the type-annotation colon from being confused with the block-introducing colon. For example, -> result: Bar:
is difficult for both humans and computers to parse.
Terminology, and generalization to parameters
(Added Jan 2024)
UPDATE: Mojo has since removed result parameters, so this section is no longer relevant.
Given that Mojo also supports returning values at the specialization phase of a function's execution, the phrase "named return slot" is not sufficient to unambiguously describe this feature. We probably need to use a term like "output argument" instead (as the counterpart to "input argument").
It then also makes sense to extend this syntax to output parameters as well:
fn f[...](...) -> [a: Foo, b: Foo](x: Bar, y: Bar):
If output parameters can be initialized using their names (e.g. a = Foo()
), perhaps the param_return
syntax would no longer be necessary? Or maybe we could support anonymous return parameters as well. The following syntax might work:
fn f[...](...) -> [Int](String, String):
If we go in this direction, we probably wouldn't be able to support -> (self)
syntax. It would have to be written -> (self: Self)
.
We could even consider allowing arbitrary combinations of named and anonymous outputs:
fn a[...](...) -> [size: Int](result: String):
fn b[...](...) -> [size: Int] String:
fn c[...](...) -> [Int](result: String):
fn d[...](...) -> [Int] String:
But within an output parameter (or argument) list, we would need to demand that all of the items be either named or anonymous, because otherwise the syntax becomes confusing:
# The following signature misleadingly suggests that `Foo`, `Bar`, and `Type` are all
# being declared as type parameters, but actually, the first two items are meant to
# be anonymous parameters whose type is `Foo` and `Bar` respectively.
fn foo() -> [Foo, Bar, Type: AnyType]:
# The following is also weird and confusing, because `Type` is a named parameter, whereas
# `Foo` and `Bar` are not parameters—they are the *types* of anonymous parameters.
fn foo() -> [Type: AnyType, Foo, Bar]:
In short, there's a lot of design alternatives here. We might want to start with something very conservative.
Update: In fact, given that Mojo doesn't allow the use of control flow constructs during the specialization-phase of a function (except indirectly, via function calls), it probably makes the most sense to get rid of param_return
, because that is a control flow construct. This would force all output parameters to be named, and initialized via an explicit assignment. This also allows us to avoid the syntax challenges discussed above.
If we took this approach, only return arguments could be declared as anonymous. We could use parentheses (or the lack thereof) as the cue to distinguish between anonymous arguments and named arguments:
# Named
fn foo() -> (a: Int, b: Int):
# Anonymous
fn foo() -> Int, Int:
# We could allow a mixture, but I don't see a reason to support this
fn foo() -> (a: Int) Int:
Side note: Named output parameters can be used to specify existential return types:
fn f(...) -> [T: SomeTrait](result: T):
Side note: The Mojo docs currently use the term "result parameter", but I think "output parameter" (alongside "output argument") works a bit nicer, since "output" is the natural inverse of "input". A function signature places inputs on the left of ->
, and outputs on the right. (We also have the concept of an inout
argument, which is simultaneously an input and an output.)
This is a really great idea, thank you for filing this. This would eliminate the need to have "guaranteed NRVO" and make it a performance optimization instead. I also appreciate how it consolidates the memory only constructors into a more logical model, as you mentioned on discord, "inout" is a lie... it's really like an out argument.
Slightly off-topic, can we use similar syntax for return parameter? It's ugly in the same way as out
variables.
alias out: Int
return_int[() -> out]()
As @soraros pointed out, this syntax extends naturally to result parameters. I added a section to the end of my original post demonstrating this.
Serendipitously, it turns out that this syntax can also be used to specify existential return types!
@nmsmith I'm more concerned about the call site. How do we bind names to these result parameters?
Given the following function signature:
fn foo(x: SomeType, y: SomeType) -> [T: SomeTrait](result: T):
I'd imagine the syntax for assigning outputs to variables would look something like this:
alias Type: SomeTrait, var result: Type = foo(x, y)
Or with type inference:
alias Type, var result = foo(x, y)
We might also want to allow users to ignore the alias
outputs, and jump straight to the var
:
var result = foo(x, y)
For this specific example, the ignored type should probably be accessible via the syntax foo.T
. So one could write:
var result: foo.T = foo(x, y)
More generally, the output parameters of a parametric function can be made accessible via the syntax f[a, b, ...].param
. In the case of output types, this syntax could potentially as the canonical name/representation of the type. Going back to the foo
example, this would allow the programmer and the compiler to reason that foo(a, b)
and foo(c, d)
have the same type.
If there are multiple output parameters or output arguments I'd imagine you'd just keep repeating the keywords:
alias Type1, alias Type2, var result1, var result2 = ...
Maybe each keyword only needs to be uttered once. However, that raises the question of what syntax should be used to express that you want to reassign an existing variable, rather than initialize a new one. The same issue occurs for declarations like var x, y = 1, 2
(which Mojo doesn't currently support).
So there are a few design details that need attention, but I don't see any roadblocks.
Chatting with Nick a bit offline, result parameters have been returned, but I still think the fn foo() -> (result: Bar):
syntax is obvious and sensible.
@lattner Does this prevent us from using the syntax for variadic generics? Example taken from Swift.
func makeTuple<T...>(_ t: T...) -> (t: T...)
FYI, we discussed this a bit today (July 15, 2024) in the Mojo Community meeting
Go (aka Golang) does this with one addition: when defining a named return value in the function definition, the variable is also initialized with that type's default value.
func demo(user_input int) (x, y int) {
x = sum * 4 / 9
y = sum - x
return // <- this "implicit return" is a Go thing; I personally prefer explicit {{ i.e.: return (x, y) }}, like in the proposal
}
Essentially, the Go named result variables do not need initialization inside the function.
@Mogball mentioned on the community call that syntax issues with this NRVO proposal are being worked through right now, so this seems like a good time for me to bring up a related issue that I have.
Mojo currently treats the syntax (A, B)
as a valid syntax for a tuple type. For example, you can write a function with signature fn foo() -> (A, B):
.
I would like to emphasise: this is inconsistent with Python's syntax for tuple types. Python's tuple class is named tuple
, and it's a parametric class, so tuple types are written tuple[A, B]
in Python. This is Python's official and widely adopted syntax for tuple types. Crucially, (A, B)
is NOT considered sugar for tuple[A, B]
. The only standard interpretation of (A, B)
is as a tuple literal, whose type is tuple[type, type]
. You can survey the standard Python type checking libraries to verify that this is the case. For example, mypy will return a type error if you try and annotate a tuple with the type (A, B)
. It suggests to use tuple[A, B]
instead.
I thought I'd mention this on this thread, because the NRVO proposal uses the syntax -> (x: A, y: B)
for named return slots. This seems similar to Mojo's current syntax for tuple types, and so some might consider this to be a problem with the NRVO syntax (which obviously needs to be distinct from tuples). Given the points I've mentioned above, I would like to suggest that we view this not as a problem with the NRVO syntax, but as a problem with Mojo's tuple type syntax. If Mojo avoids offering -> (A, B)
for tuple types, then there are fewer parsing ambiguities when the (
token is encountered after ->
in the parser, because tuple literals are not valid in type position.
More broadly, we can see that requiring the syntax Tuple[A, B]
for Mojo tuples is consistent with the syntax for all of the other standard collection types in Python and Mojo:
- A list literal is written
[2]
and its type is writtenListLiteral[Int]
orList[Int]
orlist[int]
, not[Int]
. - A set literal is written
{2}
(once Mojo supports it) and its type is writtenSet[Int]
orset[int]
, not{Int}
. - A dict literal is written
{'x': 2}
(once Mojo supports it) and its type is writtenDict[String, Int]
ordict[string, int]
, not{String: Int}
. - A tuple literal is written
(2, 3)
and its type is writtenTuple[Int, Int]
ortuple[int, int]
, not(Int, Int)
.
I already raised #1863 for the syntax.
Let's keep this topic focused on named return values :).
@dimitrilw : go's "initialize to zero" approach makes sense given Go's general philosophy on values and it is particularly useful with its specific error handling approach, but it would work better in Mojo for the result to be uninitialized. This feature is mostly useful for code that wants to control very low level details when you have non-movable types etc, not things like integers.
Hey @nmsmith thanks for chiming in. Yeah it's the use of tuple literal syntax to represent a tuple type in the return slot that's causing issues, and in fact it make the (Int, Int)
syntax contextual, because it would normally mean Tuple[type, type]
where the values are Int
and Int
. @lattner what are your thoughts on this?
@Mogball https://github.com/modularml/mojo/issues/1863#issuecomment-2044048924
^^, I'd suggest splitting out discussion of tuple syntax to its own issue :-)
@lattner Our point is that these issues are intertwined, because NRVO and tuples are fighting for the same position in the grammar. Only one syntax can win 😎.
(But I'll put my tuple-centric comments in the other thread, from now on.)
We ultimately decided on fn foo() -> String as output:
as the syntax to move this feature forward. This will be included in the next release! (Or tonight's nightly). Thank you @nmsmith for the proposal and everyone for the discussion!
We can continue to bikeshed the syntax if anyone feels strongly about this. This one is relatively easy to regex find+replace
incidentally, the "Type as name" syntax was chose to align with python pattern syntax.
the "Type as name" syntax was chose to align with python pattern syntax.
If you're referring to the pattern syntax case Point() as p:
, then the syntax you've chosen doesn't actually match, because Point()
is the syntax for a constructor, not the syntax for a type. If you instead write case Point as p:
then Point
is treated as a wildcard that matches anything, and both Point
and p
become aliases for the matched expression.
However, Python has the syntax except Exception as e:
whose semantics is delightfully inconsistent 🥲 with the semantics of patterns. The -> String as output:
syntax is arguably consistent with this syntax for except
clauses.
Anyway I'm glad to see this feature land. 🥳 This is a really clean way to eliminate unnecessary moves, and to initialize non-movable types.
Fair point. The syntax isn't finalized, there is discussion about making an "init" keyword the preferred spelling, though I'm not fully up to speed on what the final outcome wll be.