zig icon indicating copy to clipboard operation
zig copied to clipboard

RFC: Make function definitions expressions

Open hryx opened this issue 5 years ago • 88 comments

Overview

This is a proposal based on #1048 (thank you to everyone discussing in that thread). I opened this because I believe that conversation contains important ideas but addresses too many features at once.

Goals

  • Provide syntactic consistency among all statements which bind something to an identifier
  • Provide syntactic foundation for a few features: functions-in-functions (#229), passing anonymous funtions as arguments (#1048)

Non-goals

  • Closures

Motivation

Almost all statements which assign a type or value to an identifier use the same syntax. Taken from today's grammar (omitting a few decorations like align for brevity):

VariableDeclaration = ("var" | "const") Symbol option(":" TypeExpr) "=" Expression

The only construct which breaks this format is a function definition. It could be argued that a normal function definition consists of:

  1. an address where the function instructions begin;
  2. the type information (signature, calling convention) of the function;
  3. a symbol binding the above to a constant or variable.

Ideally, number 3 could be decoupled from the other two.

Proposal

Make the following true:

  1. A function definition is an expression
  2. All functions are anonymous
  3. Binding a function to a name is accomplished with assignment syntax
const f = fn(a: i32) bool {
    return (a < 4);
};

Roughly speaking, assigning a function to a const would equate to existing behavior, while assigning to a var would equate to assigning a function pointer.

Benefits

  • Consistency. There is alignment with the fact that aggregate types are also anonymous.
  • Syntactically, this paves the way for passing anonymous functions as arguments to other functions.
  • I have a suspision that this will make things simpler for the parser, but I'd love to have that confirmed/debunked by someone who actually knows (hint: not me).
  • Slightly shrinks the grammar surface area:
- TopLevelDecl = option("pub") (FnDef | ExternDecl | GlobalVarDecl | UseDecl)
+ TopLevelDecl = option("pub") (ExternDecl | GlobalVarDecl | UseDecl)

Examples

The main function follows the same rule.

pub const main = fn() void {
    @import("std").debug.warn("hello\n");
};

The extern qualifier still goes before fn because it qualifies the function definition, but pub still goes before the identifier because it qualifies the visibility of the top level declaration.

const puts = extern fn([*]const u8) void;

pub const main = fn() void {
    puts(c"I'm a grapefruit");
};

Functions as the resulting expressions of branching constructs. As with other instances of peer type resolution, each result expression would need to implicitly castable to the same type.

var f = if (condition) fn(x: i32) bool {
    return (x < 4);
} else fn(x: i32) bool {
    return (x == 54);
};

// Type of `g` resolves to `?fn() !void`
var g = switch (condition) {
    12...24 => fn() !void {},
    54      => fn() !void { return error.Unlucky; },
    else    => null,
};

Defining methods of a struct. Now there is more visual consistency in a struct definition: comma-separated lines show the struct members, while semicolon-terminated statements define the types, values, and methods "namespaced" to the struct.

pub const Allocator = struct.{
    allocFn:   fn(self: *Allocator, byte_count: usize, alignment: u29) Error![]u8,
    reallocFn: fn(self: *Allocator, old_mem: []u8, new_byte_count: usize, alignment: u29) Error![]u8,
    freeFn:    fn(self: *Allocator, old_mem: []u8) void,
    
    pub const Error = error.{OutOfMemory};

    pub const alloc = fn(self: *Allocator, comptime T: type, n: usize) ![]T {
        return self.alignedAlloc(T, @alignOf(T), n);
    };

    // ...
};

Advanced mode, and possibly out of scope.

Calling an anonymous function directly.

defer fn() void {
    std.debug.warn(
        \\Keep it down, I'm disguised as Go.
        \\I wonder if anonymous functions would provide
        \\benefits to asynchronous programming?
    );
}();

Passing an anonymous function as an argument.

const SortFn = fn(a: var, b: var) bool; // Name the type for legibility

pub const sort = fn(comptime T: type, arr: []T, f: SortFn) {
    // ...
};

pub const main = fn() void {
    var letters = []u8.{'g', 'e', 'r', 'm', 'a', 'n', 'i', 'u', 'm'};

    sort(u8, letters, fn(a: u8, b: u8) bool {
        return a < b;
    });
};

What it would look like to define a function in a function.

pub const main = fn() void {
    const incr = fn(x: i32) i32 {
        return x + 1;
    };

    warn("woah {}\n", incr(4));
};

Questions

Extern?

The use of extern above doesn't seem quite right, because the FnProto evaluates to a type:

extern puts = fn([*]const u8) void;
              --------------------
                 this is a type

Maybe it's ok in the context of extern declaration, though. Or maybe it should look like something else instead:

extern puts: fn([*]const u8) void = undefined;

Where does the anonymous function's code get put?

I think this is more or less the same issue being discussed in #229.

Counterarguments

  • Instructions and data are fundamentally separated as far as both the programmer and the CPU are concerned. Because of this conceptual separation, a unique syntax for function body declaration is justifiable.
  • Status quo is perfectly usable and looks familiar to those who use C.

hryx avatar Nov 13 '18 09:11 hryx

I love the idea overall, but wonder about the syntax a little. Defining the function and the function type is a little too close:

const A = fn(i32) void;
const B = fn(x: i32) void {};
var C: A = B;

@Hejsil just redid the stage 1 parse and probably could say if this can be parsed correctly.

bheads avatar Nov 13 '18 15:11 bheads

given we have syntactic sugar already in the form of optional_pointer.? would it be possible to make pub fn foo() void {} syntactic sugar for pub const foo = fn() void {};?

emekoi avatar Nov 13 '18 15:11 emekoi

@bheads Parsing fn defs and fn photos uses the same grammatical rules already, so this proposal doesn't make a difference in how similar these constructs will be.

@emekoi Given that Zig values "only one way", probably not. Pretty sure .? exists as asserting for not null is very common when calling into C. We also don't have .! (syntactic sugar for catch unreachable).

Hejsil avatar Nov 13 '18 15:11 Hejsil

@Hejsil according to this, optional_pointer.? was, and still is, syntactic sugar for optional_pointer orelse unreachable.

emekoi avatar Nov 13 '18 16:11 emekoi

@emekoi I know. We give syntatic sugar when it really affects the readability to not have it. Things like try is a good example. ((a orelse unreachable).b orelse unreachable).c is a lot worse than a.?.b.?.c so we give syntactic sugar here. I don't think there is really a value in keeping the old fn syntax if we're gonna accept this proposal.

Hejsil avatar Nov 13 '18 16:11 Hejsil

@bheads To me the syntax seems consistent in that curly braces after a type instantiate that type. The only missing step towards full consistency would be parameter names. The argument list of a function introduces those variables into the function's scope.

const A = struct {a: i32}; //type expression/definition
const a = A {.a = 5}; //instantiation
const F = fn(a: i32) void; //type expression/definition
const f = F { return; }; //instantiation

When instantiating a function type (F above), I would think the parameters to be exposed via the names used in the function type definition/expression. While that might decouple their declaration from their usage, it's similar to struct definitions assigning names to their members. Alternatively, if that seems too strange, I could see a builtin of the form @functionArg(index: comptime_int) T (or possibly @functionArgs() [...] returning a tuple (#208) / anonymous struct) to serve niche/"library" use cases.

rohlem avatar Nov 13 '18 22:11 rohlem

@rohlem I've contemplated that "define/instantiate a function of named type F" idea before, but it breaks down quickly for a few reasons:

  1. The parameter names are not part of the actual function type. This is fine and even useful in some cases, I think.
  2. Imagine if you wanted to write a function that implemented function type F as specified by some other library author, but you had to use the param names that that author chose. That would cause problems, including the fact that in Zig you can't shadow or otherwise repurpose any identifiers which are currently in scope. (So if this imaginary F takes a x: i32, you'd better not already have an x in scope). In Zig, you always get to choose your var identifiers, even for imported stlib packages.
  3. Making it possible to define the body of a function without having the parameter names/types and return type visible immediately above that body would be very harmful to readability and comprehension. Not just in 6 months, but now while you are currently writing the function. Unfortunately, a @functionArg(...) builtin wouldn't help there.

I agree that level of consistency is cool and enticing, but I think in this case it clearly works against Zig's goals.

hryx avatar Nov 13 '18 22:11 hryx

@hryx For the record, I overall agree with your stances.

  1. I agree that two function types (fn(a: i2) void) and (fn(b: i2) void) should compare equal. I think it would be possible to have the names as extra data in their type object anyway, which would require a couple of workarounds in f.e. comptime caching though, so it's not ideal.
  2. Imagine the same with a struct retrieved from a @cImport call. Status quo Zig does not (yet) feature struct member renaming (EDIT: as in aliasing), though I'd be all for a proposal akin to that idea, which could then equally apply to function types. (Defining your own struct with different names will _probably work if handled carefully, but it's not 100% waterproof.) (EDIT: Now I see, I guess a function scope variable is different from a member name from a language perspective, so "shadowing" applies only to the former.)
  3. I agree that it harms readability, but in code that instantiates a generic function type you're already reasonably decoupled from the concrete type. While copying around the function head worked well enough up until now, I don't think there's a suitable replacement for defining a function instance like f.e. callbackType{trigger_update(); return @functionArg(0);} (EDIT: with callbackType being variable, coming f.e. from a comptime type argument). . I think this would be the closest alternative and Zig-iest syntax for instantiating function types.
  4. The biggest argument I currently see against it would be the fact that the value of a type T in T { } now dictates how to parse the instantiation (member list vs function code), which moves us further away from context-free grammar.

Either way, just adding to the discussion. Sorry for hijacking the thread, I definitely don't think the details about decoupling parameters should stand in the way of the original proposal.

rohlem avatar Nov 13 '18 23:11 rohlem

I agree with @hryx:

Defining the function and the function type is a little too close

We could approximate the switch case syntax and do something like, which opens the door for function expressions:

const A = fn(i32) void;
const B = fn(x: i32) void => { block; };
const X = fn(x: i32) u8 => expression;
var C: A = B;

raulgrell avatar Nov 15 '18 16:11 raulgrell

@raulgrell That would also solve the ambiguity with braces in the return type.

bheads avatar Nov 15 '18 19:11 bheads

@bheads yep, I think it came up in the discussion. The only weird case I could come up with, from @hryx's post:

var g = switch (condition) {
    13   => fn() !void => error.Unlucky,
    else => null,
};

raulgrell avatar Nov 15 '18 23:11 raulgrell

What if instead if the fat arrow (=>) we instead use the placeholder syntax of while and for loops.

This allows the separation of parameter names from the type specification.

Examples:

// Typical declaration
const add = fn(i32, i32)i32 |a, b| {return a + b;};

// Usable inline
const sorted = std.sort(i32, .{3, 2, 4}, fn(i32,i32)bool |lhs, rhs| {return lhs >= rhs;});

// With a predefined type.
const AddFnType = fn(i32,i32)i32;
const otherAdd = AddFnType |a, b| {return a + b;};

Additionally, in line with #585, we could infer the type of the function declaration when obvious

// Type is inferred from the argument spec of sort
// However, the function type is created from the type parameter given
// earlier in the parameters, so I'm not sure how feasible this is
const sorted = std.sort(i32, .{3, 2, 4},  .|lhs, rhs| {return lhs >= rhs;});

We could even make the definition of the function take any expression, not just a block expression, but that may be taking it too far.

I think there is a lot of potential in this feature to provide inline function definition clarity without a lot of cognitive overhead.

(Please forgive any formatting faux pas, this was typed on mobile. I'll fix them later.)

williamcol3 avatar Dec 04 '18 18:12 williamcol3

The following is already possible (version 0.4):

const functionFromOtherFile = @import("otherfile.zig").otherFunction;
_ = functionFromOtherFile(0.33);

I prefer the "standard" way of defining functions as it is more visually pleasing to me, but I don't see any real problems with this proposal either.

ghost avatar May 15 '19 15:05 ghost

This is now accepted.

@williamcol3 interesting idea, but I'm going to stick to @hryx's original proposal. Feel free to make a case for your proposed syntax in a separate issue.

The path forward is:

  1. Update the parsers to accept both.
  2. Update zig fmt to update the syntax to the new canonical way.
  3. Wait until the release cycle is done, and release a version of zig.
  4. Delete the deprecated syntax from the parsers.

Extern can be its own syntax, or it can be downgraded to builtin function, which might actually help #1917.

andrewrk avatar Jul 04 '19 03:07 andrewrk

Wasn't a goal of Zig to say close to the syntax of C? I would say with this change, there is quite a bit difference compared to C. This would make the step for current C developers to move to Zig way bigger.

However, the change makes sense in the current expression system of Zig and I like it, but I think that this is one extra step to overcome for C developers moving to Zig.

FireFox317 avatar Jul 04 '19 06:07 FireFox317

Extern can be its own syntax, or it can be downgraded to builtin function, which might actually help

Extern functions could just be variables with a function type, but no content:

// puts is a function value with the given function type
extern const puts :  fn([*]const u8) void;

// main is a function with the implicit type
const main = fn() {
    puts("Hello, World!\n");
};

// foo is a function of type `fn()`
const foo : fn() = fn() {
    puts("called foo\n");
};

For me this seems logical if we treat functions as values, we can also declare those values extern => consistent syntax for declaration of extern or internal functions

ikskuh avatar Jul 04 '19 09:07 ikskuh

// Usable inline
const sorted = std.sort(i32, .{3, 2, 4}, fn(i32,i32)bool |lhs, rhs| {return lhs >= rhs;});

the type here could be inferred (similar to enum literals), making it:

const sorted = std.sort(i32, .{3, 2, 4}, |lhs, rhs| {return lhs >= rhs;});

Which isn't a bad "short function syntax" at all.... @williamcol3 please do make another issue for your proposal.

daurnimator avatar Jul 24 '19 08:07 daurnimator

Could the function passed to sort be comptime, so that specialization (and inlining) can occur for each distinct function that is passed?

c-cube avatar Jul 24 '19 13:07 c-cube

I just noticed that this proposal has been accepted and thought I'd throw my two cents in. I don't see a way of applying the extern keyword to the fucntion definition, as extern requires that something has a name, but with this proposal function definitions would be anonymous and only the const/var they are assigned to would have a name. This would also be consistent with how extern is applied to the declaration (the pub const bit) rather than the definition/assignment of variables and types.

SamTebbs33 avatar Jul 24 '19 14:07 SamTebbs33

Why not keep it the way it is right now?

The grammar states

FnProto <- FnCC? KEYWORD_fn IDENTIFIER? LPAREN ParamDeclList RPAREN ByteAlign? LinkSection? EXCLAMATIONMARK? (KEYWORD_var / TypeExpr)

# Fn specific
FnCC
    <- KEYWORD_nakedcc
     / KEYWORD_stdcallcc
     / KEYWORD_extern
     / KEYWORD_async (LARROW TypeExpr RARROW)?

If we take the "full reroute" and make global functions also just "variables", we get this:

// main is a function with the implicit type
pub const my_c_fun = extern fn() { // the IDENTIFIER is removed here, the FnCC not
    puts("Hello, World!\n");
};

this will also work in expression context:

iterate_and_call(my_array, stdcallcc fn(x : u32) {
    put(x);
});

but: in this context, we could also infer the required calling convention by the type that is required by iterate_and_call as the function type has to match the parameter type anyways

EDIT:

I don't see a way of applying the extern keyword to the fucntion definition, as extern requires that something has a name

The problem is that extern both states linkage as well as cdecl calling convention and also imports symbols from other translation units.

ikskuh avatar Jul 25 '19 09:07 ikskuh

Where does the anonymous function's code get put?

In the current (and only as far as the zig binary is concerned, only) LLVM module. This question is not relevant to this discussion, unless you are discussing symbol visibility. As functions in LLVM always have names, we would have to auto-name them, probably based on the scope and the scoped variable being assigned to.

shawnl avatar Jul 28 '19 22:07 shawnl

As functions in LLVM always have names, we would have to auto-name them, probably based on the scope and perhaps the scoped variable being assigned to

Can this be made to work nicely with incremental [re]compilation/linking ? As stated it looks like simple, unrelated changes to a source file could cause recompilation of a lot of things.

Sahnvour avatar Jul 28 '19 22:07 Sahnvour

Such a big change. This is going to require pretty much every zig source file in existence to be updated to support this change.

It's refreshing to see that the language is still willing to make changes like this for the sake of being better.

marler8997 avatar Aug 29 '19 17:08 marler8997

@marler8997 the plan to roll this out is to have both syntaxes supported at the same time for 1 release cycle, with zig fmt converting to the canonical style. After one release cycle this way, the old syntax is removed. We are currently doing this with use/usingnamespace.

andrewrk avatar Aug 30 '19 20:08 andrewrk

Has the grammar already been updated to reflect this change ?

ceymard avatar Oct 07 '19 10:10 ceymard

How would this interact with recursive calls? I can see it not being a problem at top level due to order-independence (so a recursive const f = fn() ... { f(); } could be resolved), but how would this work for a function defined inside something?

blackhole89 avatar Dec 09 '19 03:12 blackhole89

As #685 landed, so anonymous function literals(function or closure) should introduced? Like:

v.map(.(i) { return i+1; });
// or
v.map(.(i) -> i+1);

mogud avatar Dec 13 '19 03:12 mogud

Wasn't a goal of Zig to say close to the syntax of C?

Apparently javascript's syntax is better.

Sarcasm aside, I fail to see how this improves two goals of the zen of zig of "maintainability" and "Communicate intent precisely" and strongly disagree that this change should be made.

In my opinion this proposal:

  1. Does not solve any problems that cannot already be done with the existing method.
  2. Obfuscates the intent of whether a line of code is a procedure or a storage declaration (as mentioned in the counter arguments).
  3. Makes code less searchable and therefore less maintainable. With this change I cannot just search for fn someName to find the function definition but must now search for var name = fn and const name = fn but since keywords can be added also const name = pub fn, etc..
  4. Makes it harder to determine whether the intent is to define a function type or a function itself as the syntax is more similar.
  5. It requires at least two extra tokens to read and write.
  6. Enables repeated code like this const foo : fn() = fn() { from a comment above
  7. Encourages nested function code patterns because variables are "meant" to be passed around (as is already clear from the examples in previous comments)
  8. Will immediately lead to people wanting closures

For what it's worth, I find that zig currently is more readable and maintainable than c and javascript. Please don't "fix" what isn't broken :)

frmdstryr avatar Jan 12 '20 15:01 frmdstryr

First time i saw this proposal i was positive about it, however after second reading it i've got more negative feelings.

The status quo is completely fine and somewhat familiar to ANY programmer. There are no fundamental flaws in not applying variable declaration syntax to functions by default, I do not see benefits in forcing to think about functions as constant pointers. However i do see benefits in a 'syntactic sugar' for a such fundamental feature of the language.

Furthermore as i can understand this proposal does not solve function signature inference

const exec = fn (operation: fn (x: u32) u32, arg: u32) u32 {
    return operation(arg);
}

exec(fn (x) { // ??? will be possible?
    return x + 1;
}, 111);

I think its better to keep existing syntax. To solve the use case i described i propose an anonymous function initializers(?):

const exec: fn (fn (u32) u32, u32) u32 = .|operation, arg| {
    return operation(arg);
}

exec(.|x| {
    return x + 1;
}, 111);

Rocknest avatar Jan 12 '20 23:01 Rocknest

@Rocknest that looks about the same as this post? https://github.com/ziglang/zig/issues/1717#issuecomment-444200663

Looks like three people now have been somewhat in favor of that idea, especially because of that inline anonymous function syntax w/ inferred type. Someone want to open a new proposal for that?

kavika13 avatar Jan 13 '20 06:01 kavika13