mojo icon indicating copy to clipboard operation
mojo copied to clipboard

[Feature Request] Named output arguments for explicit NRVO (and a new syntax for output parameters)

Open nmsmith opened this issue 1 year ago • 7 comments

Review Mojo's priorities

What is your request?

Mojo is a performance-focused language, so it's valuable to eliminate unnecessary copies and moves. Furthermore, some types are not movable at all. Accordingly, it would be great if Mojo offered a way for function implementers to directly initialize a return slot provided by the caller.

In some languages (such as C#), this is achieved using out arguments, but these are awkward to work with, because they require callers to provide an Lvalue to hold the result.

Here is my proposed alternative:

fn foo() -> (result: Bar):
    result = Bar(...)
    do_something(result)
    return result
    
let x = foo()

Above, the return slot has been given an explicit name result. When execution begins, it is uninitialized, just like an out argument. It must be initialized before the function returns. Otherwise, the compiler should emit an error.

With this syntax, Mojo can achieve what is known as "guaranteed NRVO" in the C++ community, without requiring any compiler magic.

We can also support multiple named return slots:

fn foo() -> (x: Bar, y: Bar):

This syntax is different from Python's syntax for tuple types. Those are written as tuple[Bar, Bar].

Design alternatives

Because the return slot has been explicitly named, it's not strictly necessary for the return statement to include the variable name. In fact, it might even be a little bit misleading, because it suggests that it would be valid to change the variable being returned, e.g. to change return result to return blah. If this was allowed, it would be equivalent to writing result = blah; return result. I'm not sure whether this should be supported. It probably shouldn't be, because the whole purpose of NRVO is to avoid copying/moving, and return blah would need to do an implicit copy.

Accordingly, perhaps if the return slot has been named, return statements must not be followed by an expression, i.e. one must simply write return, or omit the statement entirely if an early return is not needed. This syntax would be consistent with how inout works (and how out works in C#): you smuggle data out of the function by assigning to the variable, rather than via a return statement.

If a function raises an exception, then any value that has been assigned to a return slot should be destroyed and discarded.

Constructors

Currently, Mojo constructors accept an inout self argument. This is handled with "compiler magic", because at the beginning of a call, self is actually uninitialized. (If Mojo had out arguments, we would be writing out self.) Also, the type signatures of today's constructors are inconsistent with the syntax for invoking constructors. The signatures suggest that constructors are instance methods, even though they are not invoked using instance.method() syntax. (In Python, when you invoke a constructor, you are actually invoking __new__, which partially initializes an object and then hands it off to __init__.)

Named return slots can resolve both of these issues:

struct Foo:
    var x: Int32

    fn __init__() -> (self):
        self.x = 0

I think it would be worthwhile making the parentheses around self mandatory, for consistency with the general case where a type annotation is required. Parentheses prevent the type-annotation colon from being confused with the block-introducing colon. For example, -> result: Bar: is difficult for both humans and computers to parse.

Terminology, and generalization to parameters

(Added Jan 2024)

UPDATE: Mojo has since removed result parameters, so this section is no longer relevant.

Given that Mojo also supports returning values at the specialization phase of a function's execution, the phrase "named return slot" is not sufficient to unambiguously describe this feature. We probably need to use a term like "output argument" instead (as the counterpart to "input argument").

It then also makes sense to extend this syntax to output parameters as well:

fn f[...](...) -> [a: Foo, b: Foo](x: Bar, y: Bar):

If output parameters can be initialized using their names (e.g. a = Foo()), perhaps the param_return syntax would no longer be necessary? Or maybe we could support anonymous return parameters as well. The following syntax might work:

fn f[...](...) -> [Int](String, String):

If we go in this direction, we probably wouldn't be able to support -> (self) syntax. It would have to be written -> (self: Self).

We could even consider allowing arbitrary combinations of named and anonymous outputs:

fn a[...](...) -> [size: Int](result: String):

fn b[...](...) -> [size: Int] String:

fn c[...](...) -> [Int](result: String):

fn d[...](...) -> [Int] String:

But within an output parameter (or argument) list, we would need to demand that all of the items be either named or anonymous, because otherwise the syntax becomes confusing:

# The following signature misleadingly suggests that `Foo`, `Bar`, and `Type` are all
# being declared as type parameters, but actually, the first two items are meant to
# be anonymous parameters whose type is `Foo` and `Bar` respectively.
fn foo() -> [Foo, Bar, Type: AnyType]:
# The following is also weird and confusing, because `Type` is a named parameter, whereas
# `Foo` and `Bar` are not parameters—they are the *types* of anonymous parameters.
fn foo() -> [Type: AnyType, Foo, Bar]:

In short, there's a lot of design alternatives here. We might want to start with something very conservative.

Update: In fact, given that Mojo doesn't allow the use of control flow constructs during the specialization-phase of a function (except indirectly, via function calls), it probably makes the most sense to get rid of param_return, because that is a control flow construct. This would force all output parameters to be named, and initialized via an explicit assignment. This also allows us to avoid the syntax challenges discussed above.

If we took this approach, only return arguments could be declared as anonymous. We could use parentheses (or the lack thereof) as the cue to distinguish between anonymous arguments and named arguments:

# Named
fn foo() -> (a: Int, b: Int):
# Anonymous
fn foo() -> Int, Int:
# We could allow a mixture, but I don't see a reason to support this
fn foo() -> (a: Int) Int:

Side note: Named output parameters can be used to specify existential return types:

fn f(...) -> [T: SomeTrait](result: T):

Side note: The Mojo docs currently use the term "result parameter", but I think "output parameter" (alongside "output argument") works a bit nicer, since "output" is the natural inverse of "input". A function signature places inputs on the left of ->, and outputs on the right. (We also have the concept of an inout argument, which is simultaneously an input and an output.)

nmsmith avatar Dec 10 '23 04:12 nmsmith

This is a really great idea, thank you for filing this. This would eliminate the need to have "guaranteed NRVO" and make it a performance optimization instead. I also appreciate how it consolidates the memory only constructors into a more logical model, as you mentioned on discord, "inout" is a lie... it's really like an out argument.

lattner avatar Dec 10 '23 23:12 lattner

Slightly off-topic, can we use similar syntax for return parameter? It's ugly in the same way as out variables.

alias out: Int
return_int[() -> out]()

soraros avatar Dec 11 '23 01:12 soraros

As @soraros pointed out, this syntax extends naturally to result parameters. I added a section to the end of my original post demonstrating this.

Serendipitously, it turns out that this syntax can also be used to specify existential return types!

nmsmith avatar Jan 20 '24 03:01 nmsmith

@nmsmith I'm more concerned about the call site. How do we bind names to these result parameters?

soraros avatar Jan 20 '24 12:01 soraros

Given the following function signature: fn foo(x: SomeType, y: SomeType) -> [T: SomeTrait](result: T):

I'd imagine the syntax for assigning outputs to variables would look something like this: alias Type: SomeTrait, var result: Type = foo(x, y)

Or with type inference: alias Type, var result = foo(x, y)

We might also want to allow users to ignore the alias outputs, and jump straight to the var: var result = foo(x, y)

For this specific example, the ignored type should probably be accessible via the syntax foo.T. So one could write: var result: foo.T = foo(x, y)

More generally, the output parameters of a parametric function can be made accessible via the syntax f[a, b, ...].param. In the case of output types, this syntax could potentially as the canonical name/representation of the type. Going back to the foo example, this would allow the programmer and the compiler to reason that foo(a, b) and foo(c, d) have the same type.

If there are multiple output parameters or output arguments I'd imagine you'd just keep repeating the keywords: alias Type1, alias Type2, var result1, var result2 = ...

Maybe each keyword only needs to be uttered once. However, that raises the question of what syntax should be used to express that you want to reassign an existing variable, rather than initialize a new one. The same issue occurs for declarations like var x, y = 1, 2 (which Mojo doesn't currently support).

So there are a few design details that need attention, but I don't see any roadblocks.

nmsmith avatar Jan 21 '24 07:01 nmsmith

Chatting with Nick a bit offline, result parameters have been returned, but I still think the fn foo() -> (result: Bar): syntax is obvious and sensible.

lattner avatar Feb 03 '24 05:02 lattner

@lattner Does this prevent us from using the syntax for variadic generics? Example taken from Swift.

func makeTuple<T...>(_ t: T...) -> (t: T...)

soraros avatar Feb 05 '24 02:02 soraros

FYI, we discussed this a bit today (July 15, 2024) in the Mojo Community meeting

lattner avatar Jul 15 '24 17:07 lattner

Go (aka Golang) does this with one addition: when defining a named return value in the function definition, the variable is also initialized with that type's default value.

func demo(user_input int) (x, y int) {
	x = sum * 4 / 9
	y = sum - x
	return // <- this "implicit return" is a Go thing; I personally prefer explicit {{ i.e.: return (x, y) }}, like in the proposal
}

Essentially, the Go named result variables do not need initialization inside the function.

dimitrilw avatar Jul 15 '24 17:07 dimitrilw

@Mogball mentioned on the community call that syntax issues with this NRVO proposal are being worked through right now, so this seems like a good time for me to bring up a related issue that I have.

Mojo currently treats the syntax (A, B) as a valid syntax for a tuple type. For example, you can write a function with signature fn foo() -> (A, B):.

I would like to emphasise: this is inconsistent with Python's syntax for tuple types. Python's tuple class is named tuple, and it's a parametric class, so tuple types are written tuple[A, B] in Python. This is Python's official and widely adopted syntax for tuple types. Crucially, (A, B) is NOT considered sugar for tuple[A, B]. The only standard interpretation of (A, B) is as a tuple literal, whose type is tuple[type, type]. You can survey the standard Python type checking libraries to verify that this is the case. For example, mypy will return a type error if you try and annotate a tuple with the type (A, B). It suggests to use tuple[A, B] instead.

I thought I'd mention this on this thread, because the NRVO proposal uses the syntax -> (x: A, y: B) for named return slots. This seems similar to Mojo's current syntax for tuple types, and so some might consider this to be a problem with the NRVO syntax (which obviously needs to be distinct from tuples). Given the points I've mentioned above, I would like to suggest that we view this not as a problem with the NRVO syntax, but as a problem with Mojo's tuple type syntax. If Mojo avoids offering -> (A, B) for tuple types, then there are fewer parsing ambiguities when the ( token is encountered after -> in the parser, because tuple literals are not valid in type position.

More broadly, we can see that requiring the syntax Tuple[A, B] for Mojo tuples is consistent with the syntax for all of the other standard collection types in Python and Mojo:

  • A list literal is written [2] and its type is written ListLiteral[Int] or List[Int] or list[int], not [Int].
  • A set literal is written {2} (once Mojo supports it) and its type is written Set[Int] or set[int], not {Int}.
  • A dict literal is written {'x': 2} (once Mojo supports it) and its type is written Dict[String, Int] or dict[string, int], not {String: Int}.
  • A tuple literal is written (2, 3) and its type is written Tuple[Int, Int] or tuple[int, int], not (Int, Int).

nmsmith avatar Jul 16 '24 02:07 nmsmith

I already raised #1863 for the syntax.

soraros avatar Jul 16 '24 03:07 soraros

Let's keep this topic focused on named return values :).

@dimitrilw : go's "initialize to zero" approach makes sense given Go's general philosophy on values and it is particularly useful with its specific error handling approach, but it would work better in Mojo for the result to be uninitialized. This feature is mostly useful for code that wants to control very low level details when you have non-movable types etc, not things like integers.

lattner avatar Jul 16 '24 05:07 lattner

Hey @nmsmith thanks for chiming in. Yeah it's the use of tuple literal syntax to represent a tuple type in the return slot that's causing issues, and in fact it make the (Int, Int) syntax contextual, because it would normally mean Tuple[type, type] where the values are Int and Int. @lattner what are your thoughts on this?

Mogball avatar Jul 16 '24 18:07 Mogball

@Mogball https://github.com/modularml/mojo/issues/1863#issuecomment-2044048924

soraros avatar Jul 16 '24 21:07 soraros

^^, I'd suggest splitting out discussion of tuple syntax to its own issue :-)

lattner avatar Jul 17 '24 05:07 lattner

@lattner Our point is that these issues are intertwined, because NRVO and tuples are fighting for the same position in the grammar. Only one syntax can win 😎.

(But I'll put my tuple-centric comments in the other thread, from now on.)

nmsmith avatar Jul 17 '24 05:07 nmsmith

We ultimately decided on fn foo() -> String as output: as the syntax to move this feature forward. This will be included in the next release! (Or tonight's nightly). Thank you @nmsmith for the proposal and everyone for the discussion!

We can continue to bikeshed the syntax if anyone feels strongly about this. This one is relatively easy to regex find+replace

Mogball avatar Jul 31 '24 17:07 Mogball

incidentally, the "Type as name" syntax was chose to align with python pattern syntax.

lattner avatar Jul 31 '24 21:07 lattner

the "Type as name" syntax was chose to align with python pattern syntax.

If you're referring to the pattern syntax case Point() as p:, then the syntax you've chosen doesn't actually match, because Point() is the syntax for a constructor, not the syntax for a type. If you instead write case Point as p: then Point is treated as a wildcard that matches anything, and both Point and p become aliases for the matched expression.

However, Python has the syntax except Exception as e: whose semantics is delightfully inconsistent 🥲 with the semantics of patterns. The -> String as output: syntax is arguably consistent with this syntax for except clauses.

nmsmith avatar Aug 01 '24 02:08 nmsmith

Anyway I'm glad to see this feature land. 🥳 This is a really clean way to eliminate unnecessary moves, and to initialize non-movable types.

nmsmith avatar Aug 01 '24 02:08 nmsmith

Fair point. The syntax isn't finalized, there is discussion about making an "init" keyword the preferred spelling, though I'm not fully up to speed on what the final outcome wll be.

lattner avatar Aug 02 '24 00:08 lattner