zig icon indicating copy to clipboard operation
zig copied to clipboard

Proposal: `@Result` to match cast builtins inference API

Open candrewlee14 opened this issue 1 year ago • 44 comments

Note: This is a formalized proposal based on discussion here: https://github.com/ziglang/zig/issues/5909#issuecomment-662684509

Proposal started with the addition of @ResultType, like pub inline fn intCast(x: anytype) @ResultType. It has been revised to use @Result and anytype in a function declarations, pub inline fn intCast(x: anytype) anytype

Problem: Builtin-exclusive inference API

With the recent merging of https://github.com/ziglang/zig/pull/16163, cast builtins now use an API that can't be replicated in regular userspace. For example, take the std.fmt.parseInt function:

fn parseInt(comptime T: type, buf: []const u8, base: u8) std.fmt.ParseIntError!T

which currently has to be used like:

const foo = std.fmt.parseInt(u32, content, 10); // T must be explictly provided

compared to the usage of a cast builtin which can be used like

const bar: u32 = @intCast(x - y); // T is inferred

For the sake of consistency, it seems like parseInt should be able to be used in a similar fashion.

const bar: u32 = try std.fmt.parseInt(content, 10);

Proposal: @Result

The introduction of a builtin @Result could allow the declaration of std.fmt.parseInt to look like this:

fn parseInt(buf: []const u8, base: u8) std.fmt.ParseIntError!anytype {
    const T = @Result();
    ...
}

pub fn main() !void {
    const word1 = "12";
    const word2 = "42";
    const foo: u32 = try parseInt(word1, 10); // @Result is u32
    const bar: u64 = try parseInt(word2, 10); // @Result is u64 
    ...
}

Benefits

  • This democratizes this kind of inference API currently exclusive to builtins.
  • This may improve the consistency of callsites of functions with inferable types.
  • Could benefit from cast-builtin-related improvements in type inference resolution, like in the case of something like:
    // possible builtin inference improvement
    const foo: u32 = @intCast(a + b) * @intCast(c);
    // potential downstream benefit
    const bar: u32 = try parseInt(word1, 10) + try parseInt(word2, 10);
    
  • This also allows for the user to implement a function that looks like cast builtins from a caller's perspective, like this (via https://github.com/ziglang/zig/issues/5909#issuecomment-1099710303):
    pub inline fn intCast(x: anytype) anytype {
        return @intCast(x);
    }
    

Drawbacks

  • Allows functions with identical functionality to be defined with 2 different APIs. When should a user define a function with @Result inference vs. having a comptime T: type parameter?
  • ?

candrewlee14 avatar Jul 04 '23 00:07 candrewlee14

When should a user define a function with @ResultType inference vs. having a comptime T: type parameter?

If you're returning T, never according to the new logic in zig, since you can just wrap it in an @as(T, f(...)).

One big drawback of this is that you're adding an invisible comptime parameter which silently adds more instantiations of your functions.

N00byEdge avatar Jul 04 '23 02:07 N00byEdge

I really like having custom casting functions - f.e. enhanced by comptime type checks specific to a use case. With the benefits of the new inferred result type already visible in some code, I'd hate to give them up when switching from builtins to user-space functions.

In general the interface of @ResultType() providing the type of the result location to me seems minimal and sufficient, so fit for Zig. However, the original proposal text glances over type unwrapping a bit, and I think it's a rough edge worth bringing up specifically:

The cast builtins currently already unwrap error unions and optionals: const c: error{Z}!?u8 = @intCast(@as(u16, 40)); deduces u8. It would be easier to automatically do the same thing in @ResultType().

IMO hiding this step from the user (and discarding the additional information) would be a bit of a shame though, but I can't quite figure out how to make it work.

  • If given the full type, userspace can implement these steps manually, f.e. in a helper function, making the return type something along the lines of NonOptionalPayload(@ResultType()). Note that this will practically work as long as NonOptionalPayload(@ResultType()) can implicitly coerce to the actual @ResultType(), T -> E!?T will always work.
  • However, concluding from the last bullet point, the return type std.fmt.ParseIntError!@ResultType() constructed in one example would probably not work: No matter what R = @ResultType() we provide, the returned type E!R can never match the original R expected/deduced from the expression's result location. ... That is, unless result locations were propagated through (error-unwrapping if and) try expressions, and then stripped E!T -> T.

Maybe we would want both options, accessible as @ResultType() and @ResultTypeErrorUnionPayload()/@NonErrorResultType() ? Or we choose not to provide the ~~second~~ first one, because behavior dependent on the callsite error set is too implicit. Well, I can't really think of a non-confusing use case right now, so maybe it can actually be that simple.


One big drawback of this is that you're adding an invisible comptime parameter which silently adds more instantiations of your functions.

That is true; in my custom meta functions I often have a nicer interface in f that wraps an fImpl function with more verbose signature. Especially when stripping a type (like ?T -> T) is done in userspace, you would probably want an Impl function like that to deduplicate the instantiation. For me this is an okay approach in meta code, while in other areas it could get rather crowded.

Although for explicitness here's another idea: Instead of implicitly passing a type for @ResultType() to read from, we could instead make it an explicit parameter declaration:

fn(comptime R: type = @ResultType(), x: R) R {return x;}

Downsides:

  • We don't have assignment syntax in arguments for any other use case yet.
  • We declare a parameter slot that needs to be omitted from call sites.

The most in-line with current syntax would actually be a capture IMO:

// `keyword |capture|` has precedence in Zig
fn |R| f(x: R) R {return x;}
// could use an additional keyword
fn resulttype |R| f(x: R) R {return x;}
// (could use an operator, but no precedence of this in Zig, therefore I like it less)
fn -> |R| f(x: R) R {return x;}

// could also put it after the function name, so `fn <name>` remains Ctrl+F-/grep-/searchable
fn f |R| (x: R) R {return x;}

rohlem avatar Jul 04 '23 10:07 rohlem

I like this idea a lot

What if it was infer or @Infer instead? One could imagine a future follow-up proposal for constraining the type to be an integer without specifying the size or implement a handful of functions like isLessThan, etc

Jarred-Sumner avatar Jul 04 '23 21:07 Jarred-Sumner

related, infer keyword is proposed here https://github.com/ziglang/zig/issues/9260 too, the syntax actually pairs pretty well

nektro avatar Jul 04 '23 22:07 nektro

@Jarred-Sumner So you're thinking something like this instead?

fn parseInt(buf: []const u8, base: u8) std.fmt.ParseIntError!infer T {
    ...
}

The infer keyword might communicate the existence of multiple function instances better than a builtin @ResultType.

That could also be complementary if the infer syntax was already available to be used on parameters (as in the mentioned #9260).

fn foo(bar: infer T) infer K {
    ...
}

candrewlee14 avatar Jul 05 '23 00:07 candrewlee14

I like this proposal as is, but I can also see a slight modification:

@ResultType() seems to function basically the same as anytype for parameters, in that they both implicitly make the function polymorphic. So reusing anytype in the function signature makes sense to me. @ResultType() would often still be necessary to access the actual result type within the function, but it could be made to only be usable within a function body.

Advantages:

  • anytype is shorter and easier to read/type than @ResultType()
  • @ResultType() can be used even when the result type is specified explicitly (removes redundancy when the result type in the function signature is a complicated comptime expression)
  • anytype communicates intention better when used in a function signature (possibly subjective, but IMO it makes it clear that you need to use comptime checks if you want to limit what the result type might be, just like with anytype parameters)
  • The @ResultType() intrinsic and inferred-return-type-via-anytype could be implemented as two separate, smaller features

Disadvantages:

  • Perhaps slightly more difficult to learn - two different keywords to remember (in the context of return type inferrence; anytype already exists)

bcrist avatar Jul 05 '23 01:07 bcrist

related: #447

ghost avatar Jul 05 '23 06:07 ghost

I think the anytype return type also is a valid option, but if so, @ResultType() is a terrible name and I would much rather have @ReturnType()

N00byEdge avatar Jul 05 '23 09:07 N00byEdge

I like @ReturnType() much better than @ResultType(). Its way clearer that the type is inferred from the function return location, wherever that may be.

AssortedFantasy avatar Jul 05 '23 21:07 AssortedFantasy

Hmm, personally both feel pretty similar. What about @CallsiteType?

candrewlee14 avatar Jul 06 '23 04:07 candrewlee14

To me Return and Result can both be read to mean the result returned / decided by the function itself. However, there is at least precedent for the term "Result Location Semantics" in Zig's nomenclature.

I agree CallSite (or maybe CallSiteDestination if there is some propagation through expressions like try) would be more explicit, imo preferable. (nitpick note: Wiktionary lists both call site and callsite although the first seems preferred, Wikipedia also went with the first spelling. Langref is currently unopinionated at 5 vs 5 occurrences.)

rohlem avatar Jul 06 '23 07:07 rohlem

As it was already mentioned in other discussions it's a bit cumbersome to use @as to specify the return type of an expression. For example

std.log.info("{}", .{@as(u32, try std.fmt.parseInt(content, 10))})

In the same time Zig already has a perfect way to specify the type of a constant/variable/function-argument with a colon syntax:

const x: u32 = try std.fmt.parseInt(content, 10);

It would be really nice to allow the same syntax to specify the type of an arbitrary expression (like in Julia lang, but they use double colon for this purpose). So the example above will look like this:

std.log.info("{}", .{try std.fmt.parseInt(content, 10): !u32})

Another example from here with buildins. Old syntax:

return @intCast(@as(i64, @bitCast(val)));

New syntax:

return @intCast(@bitCast(val): i64);

log0div0 avatar Jul 08 '23 08:07 log0div0

I don't have a strong opinion either way on this proposal, but I have a few notes:

  • @ReturnType would IMO be a very poor name; that sounds like something you'd use in a function body to get the return type of the function. @ResultType is a clear name which uses the lingo of RLS (rightly, since that is where this feature comes from). Alternatively, returning anytype or some infer T (if #9260 gets in) would also be reasonable, since it would show that this works a lot like a generic parameter in that it creates a separate instantiation.
  • Builtins using an API which can't be replicated in userspace is not new or controversial, and in fact is the norm. For instance, @min/@max/@TypeOf/@compileLog are varargs, @field is an lvalue, and @import's argument must be a string literal.
  • @log0div0, if you want to seriously propose that syntax it should be a separate issue.

mlugg avatar Jul 09 '23 19:07 mlugg

@mlugg

* `@ReturnType` would IMO be a very poor name; that sounds like something you'd use in a function body to _get_ the return type of the function. `@ResultType` is a clear name which uses the lingo of RLS (rightly, since that is where this feature comes from). Alternatively, returning `anytype` or some `infer T` (if [Proposal to improve the ergonomics and precision of type inference in generic functions #9260](https://github.com/ziglang/zig/issues/9260) gets in) would also be reasonable, since it would show that this works a lot like a generic parameter in that it creates a separate instantiation.

Yes, the @ReturnType() was in response to the anytype keyword for return type. that's exactly what the premise is there. It would refer to the return type of the function, not the inferred type from the call site, there would be a level of indirection where the function return type says "infer the return type" and then you're saying "use the return type, no matter if inferred or not."

If we have

fn a() anytype {
  return std.mem.zeroes(@ReturnType());
}

N00byEdge avatar Jul 09 '23 20:07 N00byEdge

I have a mostly neutral stance on this proposal.

Just a though on the name, let's have the following code:


pub fn build_something() @ResultType() {
  var something: @ResultType() = undefined;
  switch (@typeInfo(@ResultType()) {
  ....
  }
  return something;
}

It doesn't really work with anytype in the body (same with infer T):


pub fn build_something() anytype {
  var something: anytype = undefined;
  switch (@typeInfo(anytype) {
  ....
  }
  return something;
}

It should support assignment:


pub fn build_something() @ResultType() {
  var T = @ResultType(); // comptime
  var something: T = undefined;
  switch (@typeInfo(T) {
  ....
  }
  return something;
}

From here, we could argue that the following would also make sense: (@N00byEdge suggestion)


pub fn build_something() anytype {
  var T = @ReturnType(); // comptime
  var something: T = undefined;
  switch (@typeInfo(T) {
  ....
  }
  return something;
}

Now the argument I could have for it, is easier code maintenance if we have something like:


// myfield has type IsoDate

mystruct.myfield = deserialize(IsoDate, str);

// If we change myfield type, we need to find the deserialize call and change the type too

// with proposal

mystruct.myfield = deserialize(str);

// No need to change the type here

For types that could be automatically casted, this could prevent some bugs and ensure the returned type is always exactly the type of the destination variable that the returned value is assigned to (similar to why first argument was removed from builtins). Taking this into account, I could be slightly in favor of this.

kuon avatar Jul 20 '23 14:07 kuon

@kuon Basically agree with everything you've got here. I too am neutral on this.

Now the argument I could have for it, is easier code maintenance if we have something like:

But we have to be careful, that is an argument against it too, as mentioned before:

One big drawback of this is that you're adding an invisible comptime parameter which silently adds more instantiations of your functions.

N00byEdge avatar Jul 20 '23 14:07 N00byEdge

One big drawback of this is that you're adding an invisible comptime parameter which silently adds more instantiations of your functions.

I don't think this is actually a very strong argument. Comptime arguments are already passed the same as normal parameters, meaning a function's call-site can't actually always tell us when passing a different value would incur an extra instantiation.

For example:

foo(1, 2);
foo(3, 4);

Assuming one or both of the parameters of foo could be comptime, there are potentially two generic instantiations of foo here, and you can't know that until you look at the function prototype. This is also a characteristic of inline paramters, which will make a function instantiate a runtime variant, or any corresponding comptime variant depending on the comptime-known-ness of the argument.

This feature would have the same drawback as all of existing status quo and the accepted proposal (having to read the function signature to know whether it will generate separate instantiations).

InKryption avatar Jul 20 '23 14:07 InKryption

I agree with @InKryption that there are many places that function instance can be created transparently. I think this feature would be similar to functions accepting anytype.

On embedded platform, when I want to limit the use of some functions to reduce binary size, I work around this problem by adding comptime assertion to the type. For example, if I want a u8 and u16 generic but not signed or other size, I do something like:

pub fn something(myint: anytype) void {
    switch(@Type(myint)) {
        u8, u16 => {},
        else => @compileError("type not supported"),
    }
}

I actually have helpers functions for that, but you get the idea.

kuon avatar Jul 20 '23 23:07 kuon

Full support for this proposal! It would be so much more ergonomic and readable to be able to use this in userspace, such as with std.mem.zeroes() for setting default field values.

digitalcreature avatar Aug 07 '23 20:08 digitalcreature

I'd like to propose @Result() as the name of the builtin. No hungarian notation; PascalCase already indicates it is a type (see @This())

digitalcreature avatar Aug 07 '23 20:08 digitalcreature

What will happen if the call site does not have a specific type, e.g. due to peer type resolution or anytype?

ghost avatar Sep 28 '23 11:09 ghost

the same you'd get with the cast builtins: error: @intCast must have a known result type and note: use @as to provide explicit result type

nektro avatar Sep 28 '23 19:09 nektro

There seem to be a lot of suggestions here on how to signal to the compiler to infer the output type of a function. Perhaps NOT specifying any output type should just mean "infer it"? It's hard to beat 0 characters as far as syntax minimalism goes.

fn build_something() {
  var something = ..;
  return something;
}

mohamed82008 avatar Feb 07 '24 19:02 mohamed82008

Yeah, just inferring is the cleanest imo.

RossComputerGuy avatar Feb 07 '24 20:02 RossComputerGuy

There seem to be a lot of suggestions here on how to signal to the compiler to infer the output type of a function. Perhaps NOT specifying any output type should just mean "infer it"? It's hard to beat 0 characters as far as syntax minimalism goes.

fn build_something() {
  var something = ..;
  return something;
}

I agree, that it's a clean option, but now you've broke one of Zig's core principle of being very readable, even by people not familiar with Zig, now If a random user want to know the possible return type of the function, he/she will have to look for every places where that function is called to try to find a type, whereas an explicit @Result() or @ReturnType() is explicit enough that anyone, can understand that this functions return type is inferred depending on the call site. Maybe I'm stupid but If I try to imagine myself going into a code-base and finding function prototype that don't return anything I'd be pretty confused about what they are doing, especially If I see a return keyword.

pierrelgol avatar Feb 07 '24 21:02 pierrelgol

I agree, that it's a clean option, but now you've broke one of Zig's core principle of being very readable

Wouldn't not specifying a type with const or var constitute the same thing?

RossComputerGuy avatar Feb 07 '24 23:02 RossComputerGuy

Wouldn't not specifying a type with const or var constitute the same thing?

Exactly, const a: anytype = f(..) is not more readable than const a = f(..), it's less writeable though.

mohamed82008 avatar Feb 08 '24 03:02 mohamed82008

There's an important distinction to make between the proposed @Result builtin and what many people would expect of inferring the return type of a function: the @Result builtin gives the function's result type (as determined by the function's callsite), not an inferred type based on the return expressions in the function itself. To give an example of where this might be confusing:

fn add(a: u32, b: u32) @Result() {
    return a + b;
}

const c = add(2, 2); // error: call to 'add' must have a known result type

(the error here is by analogy with existing builtins which rely on their result type: https://github.com/ziglang/zig/issues/16313#issuecomment-1739865764)

If defining a function without any return type had this behavior, it would be very confusing for new users who might expect the return type to be inferred as u32, or who just forgot to write the return type and are then confronted with a bizarre and unexpected error at the callsite of the function (rather than at its definition).

Additionally, regardless of how the return type is written for such functions (omitted, anytype, etc.), the @Result() builtin (or something equivalent) would still be needed to access the result type from within the function body. For example, implementing parseInt using an inferred result type (as in the original proposal description) would require the use of @Result() in the function body to determine what type of integer needs to be parsed.

ianprime0509 avatar Feb 08 '24 04:02 ianprime0509

My hope is that the following would work.

fn add(a: u32, b: u32) {
    return a + b;
}

const c = add(2, 2); 

In this case, the compiler knows that add can only accept u32s as inputs so the comptime_ints at call-site will be cast to that type. The compiler also knows that add can only return u32 so when I assign its output to c, c can never (in a non-confusing world) be anything but u32. If I can do this logic in my head, then so can the compiler. Of course, the effect of this on compilation speed needs to be determined. It also gets trickier when the input to the function is anytype, comptime_int or comptime_float in which the case the output type may be ambiguous and a call-site type declaration or explicit casting may be necessary to resolve this ambiguity.

In some ways, making the above piece of code work is orthogonal to (and perhaps easier than) being able to access the inferred output type in the function body itself with @Result(). This is because if the code in the function body depends on the inferred output type and the inferred output type itself naturally depends on the code in the function body, then there is a circular dependence. This is probably a recipe for trouble.

mohamed82008 avatar Feb 08 '24 04:02 mohamed82008

This proposal is not really about inference of function return types by their body contents. If you have suggestions around that, that might belong in a different proposal.

The goal of this proposal is to enable callsite-inferred generics. I think it's important to keep the anytype in the function signature to signal that this is a generic function. Moving the anytype from the list of parameters to the return type in the function declaration doesn't lose readability IMO, plus it gains inference ergonomics (and arguably readability) at the callsite. Changes to builtins in the last year allowed for more readable casts, for example, and this proposal is just about enabling those same semantics in non-builtin user code.

candrewlee14 avatar Feb 08 '24 06:02 candrewlee14