zig Allow returning a value with an error

Sometimes when a function fails, there is extra information that you have on hand that may help the caller respond to the problem or produce a diagnostic. For example, in the parseU64 example by andrewrk here,

const ParseError = error {
    InvalidChar,
    Overflow,
};

pub fn parseU64(buf: []const u8, radix: u8) ParseError!u64 {

it would be useful for the function could return the position of the invalid character so that the caller could produce a diagnostic message.

Because Zig treats error types specially, when using errors you get a bunch of nice features, such as ! error-set inference, try/catch, and errdefer; you currently lose these features if you want to return extra diagnostic information since that information is no longer an error type.

While something like index-of-bad-character is less useful for parsing an integer, getting "bad character" with no location when parsing a 2KiB JSON blob is very frustrating! -- this is the current state of the standard library's JSON parser.

There are currently two workarounds possible today to let this extra information get out, neither of which are very ergonomic and which work against Zig's error types:

Workaround 1: Return a tagged union

You could explicitly return a tagged union that has the extra information:

const ParseError = error {
    Overflow,
}

const ParseResult = union(enum) {
    Result: u64,
    InvalidChar: usize,
}

pub fn parseU64(buf: []const u8, radix: u8) ParseError!ParseResult {

This is unfortunate in a number of ways. First, because InvalidChar is no longer an error, you cannot propagate/handle the failure with try/catch. Second, because the InvalidChar case is no longer an error, you cannot use errdefer to cleanup partially constructed state in the parser. Finally, calling the function is made messy because it can fail in two separate ways -- either in the error union, or in the explicitly returned union. This means calls that distinguish different errors (as opposed to just propagating with try) need nested switches.

Workaround 2: Write to an out parameter

You could also leave the error set alone, and instead expand the contract of parseU64 to write to an out parameter whenever it returns a InvalidChar error:

pub fn parseU64(buf: []const u8, radix: u8, invalid_char_index: *usize) ParseError!u64{

However, this makes the function's interface much messier: it now includes mutation, and it makes it impossible to indicate that it's being called in such a way that it cannot fail, since the pointer parameter is required (where previously a catch unreachable could handle). Also, it won't be immediately obvious which out parameters are associated with which errors, especially if inferred error sets are being used. In particular, it gives libraries writes the opportunity to sometimes re-use out parameters (in order to prevent function signatures from growing out of hand) and sometimes not (they at least cannot when the types aren't the same).

Proposal: Associate each error with a type

EDIT: Scroll down to a comment for a refreshed proposal. It looks essentially the same as here but with a bit more detail. The primary difference is not associating errors with value types, but an error within a particular error-set with a type. This means no changes to the anyerror type are necessary.

I propose allowing a type to be associated with each error:

const ParseError = error {
    InvalidChar: usize,
    Overflow, // equivalent to `Overflow: void`
};

pub fn parseU64(buf: []const u8, radix: u8) ParseError!u64 {
    ......
        if (digit >= radix) {
            return error.InvalidChar(index);
        }
    ......

The value returned would be available in switchs:

if (parseU64(str, 10)) |number| {
	......
} else |err| switch (err) {
	error.Overflow => {
		......
	},
	error.InvalidChar => |index| {
		......
	}
}

This allows a function which can fail in multiple ways to associate different value types with different kinds of failures, or just return some plain errors that worked how they did before.

With this proposal, the caller can use inferred error sets to automatically propagate extra information, and the callsite isn't made messy with extra out-parameters/an extra non-error failure handling switch. In addition, all of the features special to errors, like errdefer and try/catch, continue to work.

Errors in the global set would now be associated with a type, so that the same error name assigned two different types would be given different error numbers.

I'm not sure what happens when you have an error set with the same name twice with different types. This could possibly be a limited case where "overloading" a single name is OK, since instantiating an error is always zero-cost, but I'll ask what others think.

I'm fairly new to Zig, so some of the details may not be quite right, but hopefully the overall concept and proposal makes sense and isn't unfixably broken.

Jun 10 '19 02:06 CurtisFenner

I see potential in that. A world where error sets are just regular unions, but given all the syntax-level amenities of today's errors.

// a regular-ass type
const InvalidChar = struct {
    pos: usize,
};

// an error set containing different types
const ParseError = error {
    InvalidChar: InvalidChar,
    Overflow, // void
};

// merge like ya would today
const Error = ParseError || error{OutOfMemory};

fn f() void {
    parse(something) catch |err| switch (err) {
        .InvalidChar => |e| warn("bad character at {}", e.pos),
        .Overflow => warn("overflow"),
        .OutOfMemory => warn("out of memory"),
    };
}

Taking it further, perhaps all today's good stuff about errors could be applied to any type, not just unions. Maybe the error keyword "taints" a type as an error type. (Although, making errors non-unions would probably have too many effects on the language.)

const SomeError1 = error struct {
    val: usize,
    reason: []const u8,
};

const SomeError2 = error union(enum) {
    OutOfOrder,
    OutOfBounds,
    OutOfIdeas,
};

// today's, which is sugar for above
const SomeError3 = error {
    ResourceExhausted,
    DeadlineExceeded,
};

Because you could now "bloat" an error set with types of larger size, this might affect how strongly use of the global error set is discouraged.

Jun 10 '19 03:06 hryx

I remember seeing this proposed before but I can't find the issue for it. Maybe it was only on IRC?

Jun 10 '19 04:06 daurnimator

Thank you @CurtisFenner for a well written proposal

Jun 10 '19 04:06 andrewrk

This is just a tagged union.

And as they seem so useful, maybe we can add anonymous structs, so we can just use tagged unions instead of multiple return values.

Don't worry about the optimizations here. The compiler can handle that.

Jun 10 '19 21:06 shawnl

There's a previous issue here #572 (just for the record)

Jun 11 '19 05:06 ghost

because errors are assigned a unique value, how about allowing for tagged unions to use errors as the tag value? this would avoid adding new syntax to language and making this feature consistent with other constructs in the language. this tangentially relies on #1945.

/// stolen from above

const ParseError = union(error) {
    InvalidChar: usize,
    Overflow, // equivalent to `Overflow: void`
};

pub fn parseU64(buf: []const u8, radix: u8) ParseError!u64 {
    // ......
        if (digit >= radix) {
            return error{ .InvalidChar = index };
        }
    // ......
}

test "parseU64" {
	if (parseU64(str, 10)) |number| {
		// ......
	} else |err| switch (err) {
		error.Overflow => {
			// ......
		},
		error.InvalidChar => |index| {
			// ......
		}
	}
}

Jun 11 '19 05:06 emekoi

Agreeing with @emoki I'd like some syntactic sugar for multiple arguments to an error switch, if the type is defined in the same tagged union:

/// stolen from above

const ParseError = union(error) {
    InvalidChar: InvalidCharStruct,
    Overflow, // equivalent to `Overflow: void`

    pub const InvalidCharStruct = {
        i: usize,
        o: bool,
    }
};

pub fn parseU64(buf: []const u8, radix: u8) ParseError!u64 {
    // ......
        if (digit >= radix) {
            return error{ .InvalidChar = .InvalidCharStruct{index, false} };
        }
    // ......
}

test "parseU64" {
	if (parseU64(str, 10)) |number| {
		// ......
	} else |err| switch (err) {
		error.Overflow => {
			// ......
		},
		error.InvalidChar => |index, boolish| {
			// ......
		}
	}
}

Jun 11 '19 16:06 shawnl

I think what @emekoi suggested is excellent, as it removes the need for extra syntax and sidesteps the issues of increasing the size of anyerror and dealing with error names assigned different types, while still enabling the core idea here!

Jun 13 '19 02:06 CurtisFenner

return error{ .InvalidChar = index };

I assume this should be:

return ParseError{ .InvalidChar = index };

Otherwise I love the idea!

Jun 13 '19 04:06 daurnimator

that's what i wasn't sure about. would you still have to explicitly name the error even when using an inferred error set? or would you just use error as you normally would with an inferred error set?

Jun 15 '19 19:06 emekoi

Not a proposal, but something possible currently: here's a variation on OP's "Workaround 2" (the out parameter). A struct member instead of an "out" parameter. It's still not perfect, but this or Workaround 2 is still the most flexible as they make it possible to allocate memory for the error value (e.g. a formatted error message).

const Thing = struct {
    const ErrorInfo = struct {
        message: []u8,
    };

    error_info: ?ErrorInfo,

    // `allocator` could also be a parameter of an init function
    fn doSomething(self: *Thing, allocator: ...) !void {
        if (bad thing 1) {
            self.error_info = ErrorInfo {
                .message = try ...allocate a string...,
            };
            return error.ThingError;
        } else if (bad thing 2) {
            self.error_info = ErrorInfo {
                .message = try ...allocate a different string...,
            };
            return error.ThingError;
        } else {
            // happy
        }
    }
};

fn caller() void {
    var thing = Thing.init();
    defer thing.deinit(); // free allocated stuff in error_info if present

    thing.doSomething(some_allocator) catch |err| {
        switch (err) { 
            error.ThingError => {
                // this `.?` is the smelliest part of this idea
                std.debug.warn("error: {}\n", thing.error_info.?.message);
            },
            else => {
                // e.g. an OOM error from when we tried to alloc for the error message
                std.debug.warn("some other error\n");
            },
        }
        return;
    }

    std.debug.warn("success\n");
}

This might be a solution for std InStream and OutStream which currently have that annoying generic error parameter?

Also, for parsers and line numbers specifically, you don't need to include the line number in the error value itself. Just maintain it in a struct member and the caller can pull it out when catching. If these struct members aren't exclusive to failed states, then there's no smell at all here.

const Parser = struct {
    ...
    line_index: usize,

    parse(self: *Parser) !?Token {
        // continually update line_index, return a regular zig error if something goes wrong
    }
};

Jul 20 '19 00:07 ghost

I like @emekoi's suggestion here, but I'll note that I'd like to be able to have parseU64 return !u64 and have the error type inferred, just as we do now, and still be able to do ~~return error{ .InvalidIndex = index };.~~

Jul 20 '19 17:07 Tetralux

I guess it would actually be return error.InvalidChar{ .index = index }; - But that's still fine by me :)

Jul 20 '19 17:07 Tetralux

in your example doSomething can be cleaned up using errdefer

Jul 20 '19 18:07 emekoi

I think the issue here can be summarized by noting that zig has 2 concepts that are tied together that probably don't need to be.

Error Control Flow
Error Codes

Zig has some nice constructs that make error control flow easy to work with (try, errdefer, catch, orelse, etc). However, the only way to use them is if you return "Error Codes". If Zig provides a way to enable "Error Control Flow" with more than just "Error Codes" then applications are free to choose the best type to return error information.

Maybe Zig should be able to infer an error set that includes any type, not just error codes?

fn foo() !void{
    if (...)
        return error SomeStruct.init(...);
    if (...)
        return error.SomeErrorCode;
}

Aug 29 '19 18:08 marler8997

this c++ Proposal is so cool with this zig Proposal, so maybe a consider.l http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0709r3.pdf

Oct 12 '19 02:10 gggin

I find that the boost library outcome also support custom types. https://ned14.github.io/outcome/tutorial/advanced/payload/copy_file2/

Oct 25 '19 10:10 gggin

I additionally propose that coercing a union(error) to an error should be possible. That way you can still have a "returns all errors" function fn foo() anyerror!void but it only returns the error code, and not the error value.

Jan 03 '20 02:01 daurnimator

Is this not typically a job for some kind of interfaces, i.e. allow anything that implements IError to be used for the error control flow syntax? This would play nicely with proposals for wrapped primitives.

Jan 16 '20 14:01 nodefish

Here's a pattern that I would consider to be an alternative to this proposal:

https://github.com/ziglang/zig/blob/570973761a219cde2e7ab0b3a22fe0696a9b1e3e/lib/std/target.zig#L717-L826

I think it's quite reasonable.

Edit: here's usage example at the callsite:

https://github.com/ziglang/zig/blob/570973761a219cde2e7ab0b3a22fe0696a9b1e3e/src-self-hosted/stage2.zig#L677-L708

Feb 21 '20 20:02 andrewrk

The problem with returning a value with an error, is that it is the same as returning a polymorphic type, and defining that return type inside the function, instead of in the function signature. While I think we still need inferred return types (#447) for some things, this is an advanced feature, and fancier patterns, like the example, should be required in order to utilize these features.

We also need inferred types as a way of sneaking in multiple return values (through the anonymous structs we already have), which LLVM supports, but in C requires an ugly one-use-but-defined struct (and where the C ABI layout of that struct is ignored by the optimization pass).

Feb 21 '20 20:02 shawnl

I think it's reasonable, but I think it could be better by making minimal changes to the language. Specifically, I think Zig is already expressive enough to more tightly reflect the function's interface in its type signature; we just need to apply the right already-existing features. My two main complaints with what you can currently achieve:

IDEs cannot help you associate returned errors to particular fields on the returned struct
You cannot use a tagged union, because you want to avoid the "happy path" fields being behind a superfluous .successful case. However, using a struct instead of a tagged union means
- Understanding how the struct is populated requires reading the documentation (if it can be trusted) or otherwise the code, whereas a tagged-union is an established pattern that doesn't require you to look elsewhere
- You don't get immediate runtime checks that you are only reading from the correct field (see above); this requires boilerplate for setting defaults or setting up undefined
- You potentially waste memory in the struct layout for diagnostic fields that aren't populated in other error cases / in success

An error set like error { A, B } is essentially a enum. An error union like A!B is already essentially a tagged union, union(enum) { success: B, err: A }.

This (modified) proposal is to optionally transform the "error set" concept into an "error union" concept -- noting that a tagged union with all void field types is essentially the same thing as a enum; ie, we have just strengthened an existing concept (error sets) to another existing concept (tagged unions).

I don't think it's necessary to automatically generate the union, as emekoi suggested earlier -- we just use unions, but with tags that are errors instead of enums.

The example would look something like

pub const DiagnosticsErr = union(error) {
    UnknownCpuFeature: ?[]const u8,
    MissingArchitecture: void,
    // same as UnknownCpu: void
    UnknownCpu,
}

 pub fn parse(args: ParseOptions) !Target { // infer DiagnosticsErr!Target
    ......
    // equivalent to `return DiagnosticsErr{.MissingArchitecture={}};`
    return error.MissingArchitecture;
    ......
    return DiagnosticsErr{.UnknownCpuFeature = feature_name};
}


var result = Target.parse(.......) catch |err| switch (err) {
     error.UnknownCpuFeature => |unknown_feature_name| {
        ......
    },
    else => |e| return e,
}

I think this is a relatively small change to the language (since it re-uses existing concepts and is totally backwards compatible) to make some situations much clearer without any additional runtime/compiletime cost.

Feb 27 '20 15:02 CurtisFenner

related: #786

Apr 15 '20 02:04 emekoi

I'm going to restate the proposal a little more fully now that some discussion has improved it.

Currently, Zig has a concept of error sets, which are a collection of distinct error values. They are conceptually very similar to enums, except that they have globally unique values and can be cast to anyerror. Error sets can also be combined using ||. They are written like error { MyErr1, MyErr2 }.

Error sets can be combined with any type into error union types, written E!T where E is an error set and T is any type.

Instances of an error set are error values, written like error.MyError or ErrorSet.MyError.

Error unions have a few language features that make robustly handling errors easy: they trigger errdefers, and they have try and catch operators. In addition, the E part of E!T can be inferred.

I propose generalizing error sets into a new feature which I will call tagged error sets.

They might look like this:

const TaggedErrorSet = error {
    UnknownCpuFeature: ?[]const u8,
    MissingArchitecture: void,
    UnknownCpu, // same as UnknownCpu: void
    InvalidChar: u8,
};

Note that the existing syntax error {A, B, C} still works, and means the same thing as before.

A || B works as before, but requires that any errors in common to both are associated with the same type.

I propose that an instance of a tagged-error-set can be written like this:

error.InvalidChar('$')
TaggedErrorSet.InvalidChar('$')
// or perhaps
error.InvalidChar{'$'}
TaggedErrorSet.InvalidChar{'$'}

// There are other alternatives, but I find them less parallel with the existing shorthand that
// error.E
// is equivalent to
// (error {E}).E
error { .InvalidChar = 'x' }
TaggedErrorSet { .InvalidChar = 'x' }

An error union works as before. ! combines a (tagged) error set and a type. catch and try work as before.

A switch on an tagged error set gains an argument: error.InvalidChar => |c| This is the value that the tagged-error-set "constructor" was passed.

Like error unions, these tagged error sets are implemented as unions. Tag integer values are assigned as before in the global anyerror enum.

I don't believe this dramatically affects error-inference. The error sets are still just "combined" in a way analogous to ||, with an error raised if two errors mismatch their types.

Explicitly casting (using @as) one error set to another can drop (ie, replacing with void) any value type, including to anyerror. Casting from a smaller error set to a bigger error set may require doing a copy in the event that the union of one is larger than the union of the other. For that reason and others, I don't think implicit casting should be allowed when any of the errors have an associated non-empty type.

I'm currently writing code with a very many number of exit points which throw an error and are required to also set a message, and being diligent to always do both is dragging on me. I believe this proposal is a good fit for Zig because

this proposal is essentially backwards compatible (except possibly some introspection features)
this proposal only uses features that already existing in Zig (ie, tagged unions) but simply weren't a part of this part of the language before
this fits well into the "Zig zen":
- it clarifies intent by moving a common need (reporting information alongside an error) into the type system
- it reduces runtime crashes/bugs by
  - letting you use errdefer and try in more cases (when having to choose to use a union instead of an error)
  - ensuring that you are writing diagnostic information rather than leaving it uninitialized (when having to choose to use struct mutation with a returned error, like it's a tagged union)
- it provides an obvious way to return diagnostic information along with an error, replacing the need to decide between mutating structs, returning non-error unions, etc

May 28 '20 17:05 CurtisFenner

One problem: what if a user tries to define an error with a name already used in another error set with a different payload type? With current semantics that's no problem, but under this proposal it has to be an error. The user now has to take into account every other error set in the program, and library authors have to make sure they don't use names that anyone else might want -- #include all over again.

This could be worked around by disallowing error type inference on functions that originate tagged errors, and requiring all tagged errors to be associated with an explicit error type, but then you have the same problem naming error types, unless you unpack and propagate errors manually, which is about as much work as just passing a diagnostic pointer.

I like the concept of this proposal, but I just don't think there's a way to do it cleanly.

Aug 18 '20 15:08 ghost

That's a good callout @EleanorNB .

Here are my initial thoughts:

We could do nothing.

In general, adding a new error to an error set is a "breaking change" since callers need to be handle the new errors, so this would only affect a function which 'passes through' error sets it didn't define. (If it handles them [exhaustively] it's not surprising that it would be broken by addition of a new error set to one of the functions it calls) Perhaps this is uncommon enough that simply making this a caveat is OK.

In addition, this is already kind of a problem -- if two independently maintained error sets both coincidentally (i.e., without coordinating with each other) choose the same error name, a caller of a function which propagates both error sets can't distinguish between the two error cases (I am assuming that since they were chosen without coordination, they don't [necessarily] indicate the same kind of recovery must be done), which can result in incorrect error recovery. So, in a sense, failing compilation when a situation like this arises could actually be considered an improvement over the status quo, since otherwise code might 'coincidentally compile' even though it is not correctly handling a new type of error that was accidentally give the same name ("Compile errors are better than runtime crashes.")

Alternatively, we could make the type associated with the error part of its value, so error.MyErr and error.MyErr(usize) are assigned different values in the error-enum and are considered as distinct as error.Err1 and error.Err2

This would make handling errors slightly less convenient, since a switch case would need to indicate the type ('inferring' it would come back to this problem, since I would guess approximately 0 code would take advantage of any 'optional' disambiguation), but is a relatively small price to pay if this is the only issue with this proposal.

Aug 21 '20 23:08 CurtisFenner

This is the top 1 non-accepted proposal based on 👍

Oct 20 '20 17:10 Mouvedia

Relevant language features in other languages, seem to be polymorphic variants in OCaml. And of course exceptions in OCaml and SML. Ignore the fact that exceptions are thrown in OCaml / SML and only look at how when you declare them you can extend them. Indeed the SML dialect Alice had the ability to declare new extensible datatypes https://www.ps.uni-saarland.de/alice/manual/types.html#exttype.

The trickiest part wrt implementation seems to be that all of the above languages have a GC and therefore can use a uniform representation of the error type that is probably not available as easily to ZIG that is the pointer.

But the representation of polymorphic variants in OCaml (basically a integer derived via a hash function of the name) seems to be otherwise available. And you can do the usual set lowest bit if the whole representation fits into an integer representation (that is if the error has no associated data).

Polymorphic variants in OCaml are allowed to have payloads of different types for the same name, it's just that you can't write a function that can return both. That is it is a compile error to write a function that tries to use two different payload types for the same tag.

Dec 15 '20 15:12 bgrundmann

An explanation of the representation can be found here: https://dev.realworldocaml.org/runtime-memory-layout.html

Dec 15 '20 15:12 bgrundmann

I'm very new to zig, and loving everything so far :) I did stumble when trying to return errors with more context, and found a lot of help on the discord community (thanks again!) and that also brought me to this issue. I think something like this proposal would be very helpful, especially for writing general purpose libraries.

I'm perhaps biased by error handling in go, python and c++, so it's possible my thoughts will change with more experience in zig, but this is my today's wish for errors:

error data (e.g. line_num for parse errors, invalid byte value for stream decoders, etc.) should be part of the error value, rather than a separate context that the function receives. If main() called lib1.foo() that called lib2.bar(), and both lib1 and lib2 used diagnostic contexts to return errors, lib1 would have to unwrap and sometimes rewrap all of lib2's errors. "try lib2.bar()" would otherwise just return the error value without the related context that main() might have found useful to report to the user.
It would be useful if the (error+error_data) could itself provide a format() method. This would perhaps allow more dynamic error messages without needing allocators to generate the messages.
The ability to wrap errors with additional context might also be useful for more helpful end-user error messages (e.g. "unable to save report: libcsv failed writing /path/file.csv: No space left on device")

Mar 31 '21 12:03 happyalu

zig zig copied to clipboard

Allow returning a value with an error

Workaround 1: Return a tagged union

Workaround 2: Write to an out parameter

Proposal: Associate each error with a type

zig
zig copied to clipboard