zig
zig copied to clipboard
RFC: Make function definitions expressions
Overview
This is a proposal based on #1048 (thank you to everyone discussing in that thread). I opened this because I believe that conversation contains important ideas but addresses too many features at once.
Goals
- Provide syntactic consistency among all statements which bind something to an identifier
- Provide syntactic foundation for a few features: functions-in-functions (#229), passing anonymous funtions as arguments (#1048)
Non-goals
- Closures
Motivation
Almost all statements which assign a type or value to an identifier use the same syntax. Taken from today's grammar (omitting a few decorations like align
for brevity):
VariableDeclaration = ("var" | "const") Symbol option(":" TypeExpr) "=" Expression
The only construct which breaks this format is a function definition. It could be argued that a normal function definition consists of:
- an address where the function instructions begin;
- the type information (signature, calling convention) of the function;
- a symbol binding the above to a constant or variable.
Ideally, number 3 could be decoupled from the other two.
Proposal
Make the following true:
- A function definition is an expression
- All functions are anonymous
- Binding a function to a name is accomplished with assignment syntax
const f = fn(a: i32) bool {
return (a < 4);
};
Roughly speaking, assigning a function to a const
would equate to existing behavior, while assigning to a var
would equate to assigning a function pointer.
Benefits
- Consistency. There is alignment with the fact that aggregate types are also anonymous.
- Syntactically, this paves the way for passing anonymous functions as arguments to other functions.
- I have a suspision that this will make things simpler for the parser, but I'd love to have that confirmed/debunked by someone who actually knows (hint: not me).
- Slightly shrinks the grammar surface area:
- TopLevelDecl = option("pub") (FnDef | ExternDecl | GlobalVarDecl | UseDecl)
+ TopLevelDecl = option("pub") (ExternDecl | GlobalVarDecl | UseDecl)
Examples
The main
function follows the same rule.
pub const main = fn() void {
@import("std").debug.warn("hello\n");
};
The extern
qualifier still goes before fn
because it qualifies the function definition, but pub
still goes before the identifier because it qualifies the visibility of the top level declaration.
const puts = extern fn([*]const u8) void;
pub const main = fn() void {
puts(c"I'm a grapefruit");
};
Functions as the resulting expressions of branching constructs. As with other instances of peer type resolution, each result expression would need to implicitly castable to the same type.
var f = if (condition) fn(x: i32) bool {
return (x < 4);
} else fn(x: i32) bool {
return (x == 54);
};
// Type of `g` resolves to `?fn() !void`
var g = switch (condition) {
12...24 => fn() !void {},
54 => fn() !void { return error.Unlucky; },
else => null,
};
Defining methods of a struct. Now there is more visual consistency in a struct definition: comma-separated lines show the struct members, while semicolon-terminated statements define the types, values, and methods "namespaced" to the struct.
pub const Allocator = struct.{
allocFn: fn(self: *Allocator, byte_count: usize, alignment: u29) Error![]u8,
reallocFn: fn(self: *Allocator, old_mem: []u8, new_byte_count: usize, alignment: u29) Error![]u8,
freeFn: fn(self: *Allocator, old_mem: []u8) void,
pub const Error = error.{OutOfMemory};
pub const alloc = fn(self: *Allocator, comptime T: type, n: usize) ![]T {
return self.alignedAlloc(T, @alignOf(T), n);
};
// ...
};
Advanced mode, and possibly out of scope.
Calling an anonymous function directly.
defer fn() void {
std.debug.warn(
\\Keep it down, I'm disguised as Go.
\\I wonder if anonymous functions would provide
\\benefits to asynchronous programming?
);
}();
Passing an anonymous function as an argument.
const SortFn = fn(a: var, b: var) bool; // Name the type for legibility
pub const sort = fn(comptime T: type, arr: []T, f: SortFn) {
// ...
};
pub const main = fn() void {
var letters = []u8.{'g', 'e', 'r', 'm', 'a', 'n', 'i', 'u', 'm'};
sort(u8, letters, fn(a: u8, b: u8) bool {
return a < b;
});
};
What it would look like to define a function in a function.
pub const main = fn() void {
const incr = fn(x: i32) i32 {
return x + 1;
};
warn("woah {}\n", incr(4));
};
Questions
Extern?
The use of extern
above doesn't seem quite right, because the FnProto
evaluates to a type:
extern puts = fn([*]const u8) void;
--------------------
this is a type
Maybe it's ok in the context of extern
declaration, though. Or maybe it should look like something else instead:
extern puts: fn([*]const u8) void = undefined;
Where does the anonymous function's code get put?
I think this is more or less the same issue being discussed in #229.
Counterarguments
- Instructions and data are fundamentally separated as far as both the programmer and the CPU are concerned. Because of this conceptual separation, a unique syntax for function body declaration is justifiable.
- Status quo is perfectly usable and looks familiar to those who use C.
I love the idea overall, but wonder about the syntax a little. Defining the function and the function type is a little too close:
const A = fn(i32) void;
const B = fn(x: i32) void {};
var C: A = B;
@Hejsil just redid the stage 1 parse and probably could say if this can be parsed correctly.
given we have syntactic sugar already in the form of optional_pointer.?
would it be possible to make pub fn foo() void {}
syntactic sugar for pub const foo = fn() void {};
?
@bheads Parsing fn defs and fn photos uses the same grammatical rules already, so this proposal doesn't make a difference in how similar these constructs will be.
@emekoi Given that Zig values "only one way", probably not. Pretty sure .?
exists as asserting for not null is very common when calling into C. We also don't have .!
(syntactic sugar for catch unreachable
).
@Hejsil according to this, optional_pointer.?
was, and still is, syntactic sugar for optional_pointer orelse unreachable
.
@emekoi I know. We give syntatic sugar when it really affects the readability to not have it. Things like try
is a good example. ((a orelse unreachable).b orelse unreachable).c
is a lot worse than a.?.b.?.c
so we give syntactic sugar here. I don't think there is really a value in keeping the old fn syntax if we're gonna accept this proposal.
@bheads To me the syntax seems consistent in that curly braces after a type instantiate that type. The only missing step towards full consistency would be parameter names. The argument list of a function introduces those variables into the function's scope.
const A = struct {a: i32}; //type expression/definition
const a = A {.a = 5}; //instantiation
const F = fn(a: i32) void; //type expression/definition
const f = F { return; }; //instantiation
When instantiating a function type (F
above), I would think the parameters to be exposed via the names used in the function type definition/expression. While that might decouple their declaration from their usage, it's similar to struct definitions assigning names to their members.
Alternatively, if that seems too strange, I could see a builtin of the form @functionArg(index: comptime_int) T
(or possibly @functionArgs() [...]
returning a tuple (#208) / anonymous struct) to serve niche/"library" use cases.
@rohlem I've contemplated that "define/instantiate a function of named type F
" idea before, but it breaks down quickly for a few reasons:
- The parameter names are not part of the actual function type. This is fine and even useful in some cases, I think.
- Imagine if you wanted to write a function that implemented function type
F
as specified by some other library author, but you had to use the param names that that author chose. That would cause problems, including the fact that in Zig you can't shadow or otherwise repurpose any identifiers which are currently in scope. (So if this imaginaryF
takes ax: i32
, you'd better not already have anx
in scope). In Zig, you always get to choose your var identifiers, even for imported stlib packages. - Making it possible to define the body of a function without having the parameter names/types and return type visible immediately above that body would be very harmful to readability and comprehension. Not just in 6 months, but now while you are currently writing the function. Unfortunately, a
@functionArg(...)
builtin wouldn't help there.
I agree that level of consistency is cool and enticing, but I think in this case it clearly works against Zig's goals.
@hryx For the record, I overall agree with your stances.
- I agree that two function types (fn(a: i2) void) and (fn(b: i2) void) should compare equal. I think it would be possible to have the names as extra data in their
type
object anyway, which would require a couple of workarounds in f.e. comptime caching though, so it's not ideal. - Imagine the same with a struct retrieved from a
@cImport
call. Status quo Zig does not (yet) feature struct member renaming (EDIT: as in aliasing), though I'd be all for a proposal akin to that idea, which could then equally apply to function types. (Defining your own struct with different names will _probably work if handled carefully, but it's not 100% waterproof.) (EDIT: Now I see, I guess a function scope variable is different from a member name from a language perspective, so "shadowing" applies only to the former.) - I agree that it harms readability, but in code that instantiates a generic function type you're already reasonably decoupled from the concrete type. While copying around the function head worked well enough up until now, I don't think there's a suitable replacement for defining a function instance like f.e.
callbackType{trigger_update(); return @functionArg(0);}
(EDIT: withcallbackType
being variable, coming f.e. from acomptime type
argument). . I think this would be the closest alternative and Zig-iest syntax for instantiating function types. - The biggest argument I currently see against it would be the fact that the value of a type
T
inT { }
now dictates how to parse the instantiation (member list vs function code), which moves us further away from context-free grammar.
Either way, just adding to the discussion. Sorry for hijacking the thread, I definitely don't think the details about decoupling parameters should stand in the way of the original proposal.
I agree with @hryx:
Defining the function and the function type is a little too close
We could approximate the switch case syntax and do something like, which opens the door for function expressions:
const A = fn(i32) void;
const B = fn(x: i32) void => { block; };
const X = fn(x: i32) u8 => expression;
var C: A = B;
@raulgrell That would also solve the ambiguity with braces in the return type.
@bheads yep, I think it came up in the discussion. The only weird case I could come up with, from @hryx's post:
var g = switch (condition) {
13 => fn() !void => error.Unlucky,
else => null,
};
What if instead if the fat arrow (=>) we instead use the placeholder syntax of while and for loops.
This allows the separation of parameter names from the type specification.
Examples:
// Typical declaration
const add = fn(i32, i32)i32 |a, b| {return a + b;};
// Usable inline
const sorted = std.sort(i32, .{3, 2, 4}, fn(i32,i32)bool |lhs, rhs| {return lhs >= rhs;});
// With a predefined type.
const AddFnType = fn(i32,i32)i32;
const otherAdd = AddFnType |a, b| {return a + b;};
Additionally, in line with #585, we could infer the type of the function declaration when obvious
// Type is inferred from the argument spec of sort
// However, the function type is created from the type parameter given
// earlier in the parameters, so I'm not sure how feasible this is
const sorted = std.sort(i32, .{3, 2, 4}, .|lhs, rhs| {return lhs >= rhs;});
We could even make the definition of the function take any expression, not just a block expression, but that may be taking it too far.
I think there is a lot of potential in this feature to provide inline function definition clarity without a lot of cognitive overhead.
(Please forgive any formatting faux pas, this was typed on mobile. I'll fix them later.)
The following is already possible (version 0.4):
const functionFromOtherFile = @import("otherfile.zig").otherFunction;
_ = functionFromOtherFile(0.33);
I prefer the "standard" way of defining functions as it is more visually pleasing to me, but I don't see any real problems with this proposal either.
This is now accepted.
@williamcol3 interesting idea, but I'm going to stick to @hryx's original proposal. Feel free to make a case for your proposed syntax in a separate issue.
The path forward is:
- Update the parsers to accept both.
- Update
zig fmt
to update the syntax to the new canonical way. - Wait until the release cycle is done, and release a version of zig.
- Delete the deprecated syntax from the parsers.
Extern can be its own syntax, or it can be downgraded to builtin function, which might actually help #1917.
Wasn't a goal of Zig to say close to the syntax of C? I would say with this change, there is quite a bit difference compared to C. This would make the step for current C developers to move to Zig way bigger.
However, the change makes sense in the current expression system of Zig and I like it, but I think that this is one extra step to overcome for C developers moving to Zig.
Extern can be its own syntax, or it can be downgraded to builtin function, which might actually help
Extern functions could just be variables with a function type, but no content:
// puts is a function value with the given function type
extern const puts : fn([*]const u8) void;
// main is a function with the implicit type
const main = fn() {
puts("Hello, World!\n");
};
// foo is a function of type `fn()`
const foo : fn() = fn() {
puts("called foo\n");
};
For me this seems logical if we treat functions as values, we can also declare those values extern => consistent syntax for declaration of extern
or internal functions
// Usable inline const sorted = std.sort(i32, .{3, 2, 4}, fn(i32,i32)bool |lhs, rhs| {return lhs >= rhs;});
the type here could be inferred (similar to enum literals), making it:
const sorted = std.sort(i32, .{3, 2, 4}, |lhs, rhs| {return lhs >= rhs;});
Which isn't a bad "short function syntax" at all.... @williamcol3 please do make another issue for your proposal.
Could the function passed to sort
be comptime
, so that specialization (and inlining) can occur for each distinct function that is passed?
I just noticed that this proposal has been accepted and thought I'd throw my two cents in. I don't see a way of applying the extern
keyword to the fucntion definition, as extern requires that something has a name, but with this proposal function definitions would be anonymous and only the const/var they are assigned to would have a name. This would also be consistent with how extern
is applied to the declaration (the pub const
bit) rather than the definition/assignment of variables and types.
Why not keep it the way it is right now?
The grammar states
FnProto <- FnCC? KEYWORD_fn IDENTIFIER? LPAREN ParamDeclList RPAREN ByteAlign? LinkSection? EXCLAMATIONMARK? (KEYWORD_var / TypeExpr)
# Fn specific
FnCC
<- KEYWORD_nakedcc
/ KEYWORD_stdcallcc
/ KEYWORD_extern
/ KEYWORD_async (LARROW TypeExpr RARROW)?
If we take the "full reroute" and make global functions also just "variables", we get this:
// main is a function with the implicit type
pub const my_c_fun = extern fn() { // the IDENTIFIER is removed here, the FnCC not
puts("Hello, World!\n");
};
this will also work in expression context:
iterate_and_call(my_array, stdcallcc fn(x : u32) {
put(x);
});
but: in this context, we could also infer the required calling convention by the type that is required by iterate_and_call
as the function type has to match the parameter type anyways
EDIT:
I don't see a way of applying the extern keyword to the fucntion definition, as extern requires that something has a name
The problem is that extern
both states linkage as well as cdecl
calling convention and also imports symbols from other translation units.
Where does the anonymous function's code get put?
In the current (and only as far as the zig binary is concerned, only) LLVM module. This question is not relevant to this discussion, unless you are discussing symbol visibility. As functions in LLVM always have names, we would have to auto-name them, probably based on the scope and the scoped variable being assigned to.
As functions in LLVM always have names, we would have to auto-name them, probably based on the scope and perhaps the scoped variable being assigned to
Can this be made to work nicely with incremental [re]compilation/linking ? As stated it looks like simple, unrelated changes to a source file could cause recompilation of a lot of things.
Such a big change. This is going to require pretty much every zig source file in existence to be updated to support this change.
It's refreshing to see that the language is still willing to make changes like this for the sake of being better.
@marler8997 the plan to roll this out is to have both syntaxes supported at the same time for 1 release cycle, with zig fmt
converting to the canonical style. After one release cycle this way, the old syntax is removed. We are currently doing this with use/usingnamespace.
Has the grammar already been updated to reflect this change ?
How would this interact with recursive calls? I can see it not being a problem at top level due to order-independence (so a recursive const f = fn() ... { f(); }
could be resolved), but how would this work for a function defined inside something?
As #685 landed, so anonymous function literals(function or closure) should introduced? Like:
v.map(.(i) { return i+1; });
// or
v.map(.(i) -> i+1);
Wasn't a goal of Zig to say close to the syntax of C?
Apparently javascript's syntax is better.
Sarcasm aside, I fail to see how this improves two goals of the zen of zig of "maintainability" and "Communicate intent precisely" and strongly disagree that this change should be made.
In my opinion this proposal:
- Does not solve any problems that cannot already be done with the existing method.
- Obfuscates the intent of whether a line of code is a procedure or a storage declaration (as mentioned in the counter arguments).
- Makes code less searchable and therefore less maintainable. With this change I cannot just search for
fn someName
to find the function definition but must now search forvar name = fn
andconst name = fn
but since keywords can be added alsoconst name = pub fn
, etc.. - Makes it harder to determine whether the intent is to define a function type or a function itself as the syntax is more similar.
- It requires at least two extra tokens to read and write.
- Enables repeated code like this
const foo : fn() = fn() {
from a comment above - Encourages nested function code patterns because variables are "meant" to be passed around (as is already clear from the examples in previous comments)
- Will immediately lead to people wanting closures
For what it's worth, I find that zig currently is more readable and maintainable than c and javascript. Please don't "fix" what isn't broken :)
First time i saw this proposal i was positive about it, however after second reading it i've got more negative feelings.
The status quo is completely fine and somewhat familiar to ANY programmer. There are no fundamental flaws in not applying variable declaration syntax to functions by default, I do not see benefits in forcing to think about functions as constant pointers. However i do see benefits in a 'syntactic sugar' for a such fundamental feature of the language.
Furthermore as i can understand this proposal does not solve function signature inference
const exec = fn (operation: fn (x: u32) u32, arg: u32) u32 {
return operation(arg);
}
exec(fn (x) { // ??? will be possible?
return x + 1;
}, 111);
I think its better to keep existing syntax. To solve the use case i described i propose an anonymous function initializers(?):
const exec: fn (fn (u32) u32, u32) u32 = .|operation, arg| {
return operation(arg);
}
exec(.|x| {
return x + 1;
}, 111);
@Rocknest that looks about the same as this post? https://github.com/ziglang/zig/issues/1717#issuecomment-444200663
Looks like three people now have been somewhat in favor of that idea, especially because of that inline anonymous function syntax w/ inferred type. Someone want to open a new proposal for that?