zig icon indicating copy to clipboard operation
zig copied to clipboard

Proposal: disallow referencing errors with `ErrorSet.Foo` in favour of `error.Foo`

Open mlugg opened this issue 6 months ago • 8 comments

Error values in Zig can be constructed in three ways:

  • @errorFromInt. This is just a low-level primitive which is useful in niche scenarios (usually ABI stuff, such as passing an error as user data through a C API).
  • error.Foo. This is by far the most common way.
  • ErrorSet.Foo, where ErrorSet is an error set type. This is used incredibly rarely.

It's clear that ErrorSet.Foo is theoretically a redundant form, because it's exactly identical to @as(ErrorSet, error.Foo). The type annotation in that alternative is only relevant in very niche scenarios; in virtually all cases, you can get away with just writing error.Foo instead. In practice, people almost exclusively write error.Foo instead of ErrorSet.Foo; to the point where I encounter the latter so rarely it briefly confuses me whenever I do. As is often the case, having two ways to do the same thing hinders readability.

In principle, ErrorSet.Foo could have the benefit of resistance against typos, because accidentally writing ErrorSet.Fooo is a compile error, while error.Fooo is not. However, I don't find this argument compelling. There are two common scenarios where writing error values: in switch items (by which I mean the values to the left of =>), and in return <error value>. In both of these cases, there is already a specific error type which is expected. In the latter case, the function could have an inferred error set (i.e. !T instead of ErrorSet!T), but if a user is going to go to the effort of writing return ErrorSet.Foo everywhere, it's more valuable to redirect that effort to the return type, where all return values are guaranteed to be checked, the code becomes more readable, and generated documentation is improved.

Lastly, here are two more annoying consequences of ErrorSet.Foo:

  • They introduce a special case to Decl Literals. Usually, .Foo is equivalent to ResultType.Foo, but this is not the case for error sets (you can't write .MyError as shorthand for ResultErrorSet.MyError). This shows up as an explicit special case in the compiler implementation.
  • They allow constructing error values at comptime with @field (for instance, @field(anyerror, my_string)). This isn't used in practice, but could hugely harm readability of a codebase if it was (e.g. it could become very difficult to find where an error was returned from).

As a result of the above, I propose disallowing ErrorSet.Foo (including anyerror.Foo) in favour of error.Foo. This would also disallow @field(ErrorSet, "foo") (including @field(anyerror, "foo")), so it becomes impossible to construct error values at comptime from a string error name.

mlugg avatar May 30 '25 07:05 mlugg

  • They allow constructing error values at comptime with @field (for instance, @field(anyerror, my_string)). This isn't used in practice, but could hugely harm readability of a codebase if it was

I do in fact use exactly @field(anyerror, error_name) in practice, in generic code regarding error sets, to construct error values from comptime strings. See for example https://github.com/ziglang/zig/pull/19092 , which was rejected as "[not] needed enough to add to the standard library". This resolution reads to me as "users can already do this", not "the language should not allow this".

(e.g. it could become very difficult to find where an error was returned from).

Error return traces are supposed to show where an error was returned from. They are more effective than textual searches like grep, and work regardless of how the value was constructed.

it becomes impossible to construct error values at comptime from a string error name.

Is there any benefit to making this impossible?

rohlem avatar May 30 '25 11:05 rohlem

It sounds like this proposal as written would make https://github.com/SuperAuguste/cursed-zig-errors no longer work, which seems like a positive change (and probably a necessary one for its existence to not pose a threat to incremental, if I understand correctly).

InKryption avatar May 30 '25 11:05 InKryption

@rohlem

I do in fact use exactly @field(anyerror, error_name) in practice, in generic code regarding error sets, to construct error values from comptime strings.

I know "you shouldn't do that" can sometimes be seen as a bit of a cop-out when discussing use cases, but in this particular case I am very confident in saying that yeah, you should definitely not be constructing error names based on comptime-known strings. This screams of a poorly-designed API.

Error return traces are supposed to show where an error was returned from.

Indeed, and they're an incredibly useful feature. Nonetheless, grep-ability is still extremely important when trying to understand code without trying to actually trigger errors while running it.

In general, the ability to grep for an identifier and immediately see everywhere that name could be declared is extremely useful and can be the difference between navigating a foreign codebase being simple and borderline-impossible (ask me how I know :P). As far as I can tell, Zig has two main ways it deviates from this: reified (@Type) fields, and error values. This proposal solves the less contentious of the two.

Is there any benefit to making this impossible?

As discussed above, grep-ability is important. Eliminating "identifier metaprogramming" like this also helps readability, in this case because the error is spelled out in one place instead of potentially having to hunt through a few different call chains to understand how (for instance) two strings are created and then joined to make an error name.

@InKryption

It sounds like this proposal as written would make https://github.com/SuperAuguste/cursed-zig-errors no longer work, which seems like a positive change

Correct, and correct -- the concept that repo is based on is pretty gross (no shade to Auguste, I entirely understand that it was written as a joke / silly PoC, I've written plenty like that myself!).

and probably a necessary one for its existence to not pose a threat to incremental, if I understand correctly

Actually, no, because that repository is using an implementation-defined property (the ordering of error integers relative to one another). It so happens that with our Zig implementation, that matches the order in which errors are analyzed, and analysis is currently single-threaded; the repo relies on both of these properties to essentially emulate an ArrayList of strings which can be appended to and read back. In other words, that repository doesn't work when using incremental compilation, not because incremental compilation is violating the Zig language specification, but because that code relies on incorrect assumptions about what the Zig language specification says.

Thinking about this does lead to another proposal, which is to make @intFromError/@errorFromInt always be a runtime-known value so as to avoid implementation-defined (and, in Debug mode, perhaps inconsistent?) error integer values from affecting the success or failure of compilation, but I'll leave that one for another day.

mlugg avatar May 30 '25 12:05 mlugg

I have the following in my code:

                .error_set => |es| if (es) |errors| {
                    inline for (errors, 0..) |err_rec, index| {
                        // get error from global set
                        const err = @field(anyerror, err_rec.name);

chung-leong avatar May 30 '25 12:05 chung-leong

I have some code in the wild that relies on this - definitely not idiomatic Zig but useful for bindings: https://codeberg.org/linusg/icu4zig/src/commit/e805d44c5de403b62e9b92a9ad4921fb4c9421b1/src/error.zig#L177-L237

I don't need @field(ErrorSet, "Foo") for this but some way to return error.Foo from comptime-known "Foo" would be useful.

linusg avatar May 30 '25 12:05 linusg

you should definitely not be constructing error names based on comptime-known strings. This screams of a poorly-designed API.

I don't use @field(anyerror, x) as the primary source of any error values. (EDIT: To me, the previous comment looks like a perfectly reasonable use case doing exactly this though.) I use it exclusively in utility functions, which are already passed error values and error sets as inputs.

Status-quo makes it a compile error to include a switch case for an error that is not part of the subject's error set type. That is why switch constructions don't scale well for usage scenarios with generic error sets. (Proposal https://github.com/ziglang/zig/issues/20610 , rejected without comment, was meant to address this.) Generic utility functions, which often rely on @field(anyerror, x), scale better for this in status-quo.

This screams of a poorly-designed API.

In my opinion/experience making "bad" more difficult isn't directly helpful. Most everybody will already use "good" error.X over "bad" @field(anyerror, "X") wherever possible. If somebody uses the ugly option, it's because they couldn't use the nice option. Complicating this usage only makes the "bad" code which relies on it even less readable. (The easiest way to make people use the "good" option is to make it support their use case - to the extent it doesn't degrade the "good" option. If it does, maybe keep the "bad" option for these edge cases.)

rohlem avatar May 30 '25 15:05 rohlem

searching through the code i look over i use it personally once here https://github.com/nektro/zig-errno/blob/4c86c6bd5dcf4ed1477b0957684a80ec9d093b67/errno.zig#L1278-L1289

and bun uses it once here https://github.com/oven-sh/bun/blob/fd894f5a656b79ac7b328696efe6cc2b584077bb/src/bun.zig#L2717-L2728

something that seems worth point out about these examples is that while it is doing @field(anyerror, ...) its not creating any errors. (ie its returning an error based on a comptime string but inside a function with a discrete error set)

nektro avatar May 30 '25 19:05 nektro

Here's another instance where I resorted to using @field(...) to obtain an error:

    const error_set = es_info.error_set orelse &.{};
    var error_enum_buffer: [error_set.len]struct {
        status: old_enum_type,
        err: new_error_set,
    } = undefined;
    var signatures: [error_set.len]comptime_int = undefined;
    for (error_set, 0..) |e, index| {
        error_enum_buffer[index].err = @field(new_error_set, e.name);
        signatures[index] = asComptimeInt(e.name);
    }

The code in question transforms C functions into ones that follow the Zig convention. The lines above constructs at comptime a table mapping status enums (from the C API) to errors based on their names.

chung-leong avatar May 31 '25 08:05 chung-leong

I am hoping to soon going to go through the specific use cases given here soon and see what I think about each one (in particular, determine whether I can see a satisfactory solution for the given use case which does not depend on this language feature). If I can't, then I think at a glance that every use case proposed here would be solved by applying this diff to std/builtin.zig:

     /// This data structure is used by the Zig language code generation and
     /// therefore must be kept in sync with the compiler implementation.
     pub const Error = struct {
         name: [:0]const u8,
+        val: anyerror,
     };

...and (as a more-or-less necessary consequence of that diff) disallowing reification of error sets (i.e. disallowing @Type(.{ .error_set = ... })). The reason this works is because it looks like the uses people have for @field(ErrorSet, ...) (including @field(anyerror, ...)) are really just to instantiate an already declared error value in generic code, usually based on a name found from iterating @typeInfo of an error set. With this potential change, you could do, for instance, const err: ErrorSet = @typeInfo(ErrorSet).error_set.?[0].val; to get an error from an error set type. This still achieves the goals set out in the proposal, because this is very obviously only appropriate when you're actually doing metaprogramming on error sets, and the error name declaration will still appear verbatim somewhere in the source (in an explicit error set declaration, error{ ... }).

However, I'm not proposing this just yet; I want to go through the use cases first since I would like to avoid this extra change if it seems unnecessary. (In part this is because I have been considering a proposal to remove anyerror, so want to avoid language dependence on it. To be clear, I'm aware that that's a much bigger proposal -- I am in no way proposing that here.)

mlugg avatar Jun 01 '25 07:06 mlugg

Okay, one at a time!


@linusg thanks for the small, concrete example. I can certainly see the use case here. However, I think the code you linked is actually improved a fair bit by just doing the conversions by hand, using pretty minimal metaprogramming -- all with a barely increased LOC count, at least for this particular library. You get better static safety, (IMO) more readable code, and more versatility with how you translate errors. Take a look at this file to see: https://zigbin.io/34faae (doc comments at the top explain why I think it's an improvement).


@rohlem

You've not referenced any concrete use case, so there isn't much to respond to, but just on one point:

(Proposal https://github.com/ziglang/zig/issues/20610 , rejected without comment, was meant to address this.)

As the wiki has said for over 3 years, please do not file a proposal to change the language. Nothing I say here is an invitation to re-open that proposal or any other, even if citing a different use case.

However, for what it's worth, the example given in that proposal is taking the wrong approach: rather than trying to make the language support error sets varying based on target, just use the same error set. You need to handle the target-specific errors either way, and the prongs can get optimized away on targets where they're impossible, so there's not really any benefit to varying the error set. Meanwhile, it hugely hinders readability and documentation generation (since comptime logic is generally a surefire way to harm both of those). This is why, for instance, std.posix.OpenError contains error.InvalidUtf8, which is WASI-specific, and error.InvalidWtf8, which is Windows-specific. The only time I ever run into this inconvenience is when using inferred error sets with comptime or inline calls; this is extremely rare anyway, so not a huge deal, but could be addressed by the existing proposal #17125.


@nektro, I don't really understand why your zig-errno project exists. The errno enum already exists under std.c.E, and converting errno to a Zig error code in the "generic" way your repo does seems like a quite bad idea, because any given operation can only return a tiny subset of errno codes, so the error sets immediately become an order of magnitude bigger than they need to be, making correct error handling unrealistic.

That's why, for instance, std.posix has an explicit translation from errno to Zig errors in every function. And even if you did for some reason want to just map all errors like this with no regard for what's actually possible for a given call, the errno names are not really appropriate for Zig error names, which is why std.posix converts them all to idiomatic Zig errors -- for instance, EACCES usually becomes error.AccessDenied. To convert to names like these, you need an explicit switch anyway.

Also, a single errno code can have somewhat different interpretations based on the call; for instance, ENOTDIR from fchmod essentially just means "file not found" despite the error code, so translating that to e.g. error.NotDir would be quite confusing from a user's perspective.

For these reasons, I do not find this use case compelling.


@chung-leong

I'm sorry, the code you linked is entirely undocumented and uncommented, so I don't even know what C API it's wrapping. It is therefore impossible for me to analyze your use case with what you've provided.

mlugg avatar Jun 01 '25 09:06 mlugg

Thanks for the detailed response! I'm convinced this is a good solution, at a much larger scale you could have a bit of codegen in the build system that spits out the various fooErrorResult functions if needed. Will update icu4zig to use this approach (edit: https://codeberg.org/linusg/icu4zig/commit/ac564d829bd79a895a581c85608039d3dba216b7), no further concerns 👍

linusg avatar Jun 01 '25 10:06 linusg

@mlugg Thanks for your response.

the example given in that proposal is taking the wrong approach: rather than trying to make the language support error sets varying based on target, just use the same error set.

That's certainly another approach, however I don't see one as "right" and therefore the other being "wrong".

More distinct types (subsets of error sets, enums, tagged unions) provide more information. They make code more complex (imo this may be counteractable via tooling), but also more precise. This precision can help human readers and the compiler / tooling alike. I think there's a natural tradeoff between a high level of precision (which can arguably be higher than useful for a given use case) and simplicity.

comptime in status-quo is expressive enough to allow modeling any complexity of system, whether "right" or "wrong" - whether any one reader finds this complexity justified or not. I'd strongly prefer if the language didn't add arbitrary restrictions to this expressiveness. (Needless to add that this is a general matter of arbitrarily complex comptime logic and configuration, and not restricted to compilation-wide settings like compilation targets.)

That said, this is of course just my opinion, and I understand, respect, appreciate yours and others' differing from it.

rohlem avatar Jun 01 '25 13:06 rohlem

(i.e. disallowing @Type(.{ .error_set = ... })).

fwiw that diff could still happen while preserving reification capabilities if it were possible to use @ErrorSet(names: []const []const u8).

nektro avatar Jun 03 '25 10:06 nektro

Concerning the main issue, this proposal if implemented will cause some "type safety" problems. Zig's error.MyError is error prone (npi), that is one of the reason why we create error sets. However Zig's error set inference is very convenient. Imagine a function that can return error from multiple error sets (suppose distinct, although not relevant), we can leave the return type as !T (inferred), but in the body choose exactly what KIND of error we want to return (return ErrorSet1.Foo; ... return ErrorSet2.Bar). It's not as clear as using || to specify the possible errors, but it's still more maintainable than using error.Foo or error.Bar. It serve code self documentation. Even with explicit return type, it's still relevant, it adds context. A return ParseError.InvalidCharacter is more telling than just return error.InvalidCharacter.

Personally

  • I find ErrorSet in switch good code self documentation. I would argue in favor of more special cases for errors. Accepting ErrorSet directly in switch, for example, would make them more structured and can reveal the logic of a complex switch at glance => more readable.
  • anyerror.Foo is indeed redundant, should likely be removed.
  • I don't know about creation via @field, but in general error trace from comptime can/should be improved.

Lking03x avatar Jul 04 '25 18:07 Lking03x