RFC/AISlop: `match` statement and declared exceptions
This is a fairly chunky syntax PR that implements two new features that are technically unrelated, but I would like to propose together:
- The
matchcontrol flow statement (previously discussed in #18285) - A syntax for "declared exceptions" (a variant on #7026)
The semantics for match are described in https://gist.github.com/Keno/e08874a423e0f1df4d11a798c7ed147c. The full semantics for declared exceptions are described in https://gist.github.com/Keno/7d1fb27a003afba6de50686ccfbb6610
However, the basic gist is that you can annotate expected exception types together with return types
function read(io, ::Type{UInt8})::Except{UInt8, Union{IOError, CancellationRequest}}
...
end
then, there is a new postfix ? operator that propagates declared (and only declared) exceptions:
function read(io, ::Type{String})::Except{String, Union{IOError, CancellationRequest}}
String(collect(takewhile(!=(0), read(io, UInt8)? for _ in repeated(nothing))))
end
The semantics of postfix ? are to propagate any declared exceptions, otherwise they get thrown. To more selectively treat exceptions there is a new match? control-flow structure, with the same semantics as match described in the link above, but operating on the declared exceptions only (if in the exceptional path). Example:
function read_thing(io)::Except{String, CancellationRequest}
match? read(io, String)
# ENOENT expected, return empty string
IOError(UV_ENOENT) -> ""
end? # All other IOErrors get thrown, CancellationRequest gets propagated as a declared exception
end
This PR is semi-functional, but entirely AI slop - do not read the implementation, you have been warned. The purpose of this PR is to provide a dummy implementation that can be used to feel out the design ergonomics and discover any unexpected corner cases.
I am not a heavy user, but I think https://github.com/Roger-luo/Moshi.jl has a pretty well-thought-out design for reference
There's several pattern matching macros in the ecosystem. I've taken a look at a few of them, but this doesn't match any of them exactly - you have more freedom in syntax, but there also needs to be more uniformity with the rest of the syntax, so it works a little different.
Capturing some discussion points from this morning:
- General dislike of the
::Exceptannotation to the extent that it's magic. - @StefanKarpinski would still like to annotate at the callsite. However, I objected to the
throwskeyword since the call-site does not handle withtry/catch.
One possible solution is to make except a keyword that could be used either on the signature or the callsite,
i.e. the following are equivalent
function error(s) except ErrorException
throw(ErrorException())?
end
function error(s)
throw(ErrorException()) except ErrorException
end
There's a little bit of a question of what to do if you have both:
function foo() except A
bar() except B
end
I think the answer is that you need to satisfy both A and B to get propagated, but I need to think that through a little more.
I like this design a lot. I like that
- It relies on ordinary Union types that follows normal union type behaviour, not some special deep compiler magic. The compiler magic is only a superficial syntax layer.
- Error types are automatically "upgraded" to real exceptions when not handled. This is ergnomically great.
- They mix well with the two existing patterns: Union return types and exceptions.
I have some questions about the design
- How does it work in closures? I.e. if I do
function foo(xs)
map(xs) do i
foo(i)?
end
end
Does the ? return from the closure out to foo, or from foo to foo's caller?
- What if you annotate a function to throw
T1, but it really throwsT2- does the caller have to handleT1orT2? E.g. suppose I have
foo()::Except{Any} = throw(ErrorException("boo!"))?
function bar()
match? foo()
ErrorException -> 0
end
end
Does bar() a) return 0 because it handled the actually thrown error, or b) throw, because it dot not handle the declared Any error?
- Is it possible to directly handle errors as an ordinary Union type? I think that would be nice. I.e. is it possible to do
y = some_erroring_function()
if y isa ExceptOrError{BoundsError}?
...
elseif y isa # etc etc
...
I have some questions about the design
- How does it work in closures? I.e. if I do
function foo(xs) map(xs) do i foo(i)? end endDoes the
?return from the closure out tofoo, or fromfootofoo's caller?
Returns from the closure, but the more interesting question is if you annotate an exception type on the outer function, does it apply to the closure. I haven't fully thought this through yet.
- What if you annotate a function to throw
T1, but it really throwsT2- does the caller have to handleT1orT2? E.g. suppose I havefoo()::Except{Any} = throw(ErrorException("boo!"))? function bar() match? foo() ErrorException -> 0 end endDoes
bar()a) return 0 because it handled the actually thrown error, or b) throw, because it dot not handle the declared Any error?
Returns 0.
- Is it possible to directly handle errors as an ordinary Union type? I think that would be nice. I.e. is it possible to do
y = some_erroring_function() if y isa ExceptOrError{BoundsError}? ... elseif y isa # etc etc ...
Yes, you can access the inner function directly. The implementation in this PR is not quite correct yet in this regard, so I can't give you the syntax for it to try out, but it'll be something like:
y = exceptcall(some_erroring_function)
(same as kwcall).
but the more interesting question is if you annotate an exception type on the outer function, does it apply to the closure.
I'm leaning towards no.
I had a bunch of thoughts on this that maybe superficial or a bit bikeshed-y at this stage, but I put them in a gist in case it sparks some thoughts or is useful in future discussion: https://gist.github.com/digital-carver/be6a16b9d3d9d4faa3fbb82ee0054feb
I had a bunch of thoughts on this that maybe superficial or a bit bikeshed-y at this stage, but I put them in a gist in case it sparks some thoughts or is useful in future discussion: https://gist.github.com/digital-carver/be6a16b9d3d9d4faa3fbb82ee0054feb
In general, I appreciate hearing everyone's perspective - it's very easy to get attached to a particular design, so hearing input is helpful. That said, I will also caution that the same attachment can also happen for people commenting on designs, so if anybody makes a suggestion that I end up not taking up, please don't take it personally :).
Error returns vs Exceptions
The mechanism is not orthogonal, is is a locally structured extension. If you do not handle a declared exception, it gets automatically thrown, so you would expect the type hierarchy of declared exceptions to be a sub-hierarchy of the things you expect to be thrown. I do agree that there needs to be a distinction drawn between the mechanisms, which is why I didn't go with the throws proposal (because you can't catch them unless you forgot to handle them first). That said, I think it would be perfectly fine to call this exceptions and rebrand the current exception system as thrown exceptions.
Signal vs Clutter in return annotation ::T ?:DomainError
I was considering this, with various variations on using the ASCII ?, but I couldn't come up with a good syntax that didn't feel like ASCII line noise. Open to suggestions.
match?-end?
I think there's a misunderstanding somewhere. match? is intended to be an ordinary match on the exceptional part of the discriminated union (if the result is exceptional. The end? is not syntax, but an ordinary postfix ? on the match? expression. If you don't write it, an unhandled declared exception errors.
to bikeshed the term Except; that choice reads to me a bit like "returns everything except Foo" which is kind of confusing. maybe Result to match Rust? other random ides: Checked, Expect (lol), or even no word and just directly to the braces ::{Int, BoundsError} I believe is available syntax?
Claude informs me that we're reinventing modula-3, which both uses the except keyword as allows pattern matching on the exception https://modula3.elegosoft.com/cm3/doc/tutorial/m3/m3_45.html#SEC45
Modula 3 uses raises as the keyword in signatures.
Claude suggests may as in
function foo() may IOError
print(stdout, "foo") may IOError
end
which is kinda cute (and shorter than except). Also follows the "name it how you would explain it", as in "this function returns any ordinarily, except it may give an IOError also"
unless also a reasonable option. Actually, scratch that. Several languages use this for negated if (both prefix and postfix), so it's probably too violative of those preconceptions (also longer).
Re: ? from closures: Yes, returning from the closure to the outer function is the only reasonable solution semantically. However, I'd like to note that it's usually not what people want, and is one of my frustrations with Result try in Rust.
Consider an example from your exceptions design document:
function map(f, arr)::Except{Any}
[f(a)? for a in arr]
end
The problem here of course is that an array comprehension creates an implicit closure, which the ? is then evaluated in. This was presumably not what was meant. This pattern is very common and quite annoying, see e.g. this Reddit thread.
I don't have a solution - it would be too weird and unmanageable if ? not returned the current function (i.e. the closure) but could exit more than one function. Just some food for thought.
Well the problem in general is that there isn't really a guarantee that the outer function is still on the stack. That said, for comprehensions, the closure is a bit of an implementation details, so we could maybe treat that specially. I do agree that it deserves careful thought.
Well the problem in general is that there isn't really a guarantee that the outer function is still on the stack.
I'm probably having a slow start to my day today, but what does this mean? Are you saying that the implicit closure function from the generator is no longer in the call tree, or that map is no longer on the call tree..?
Some other points, mostly bikeshedding since the general concept of pattern matching is well established in other languages by now:
match? foo()feels really weird - why notmatch foo()?? If I understand correctly, that would also remove the need forend?, which can be easily missed in longer code blocks sinceendis likely filtered out by most users mentally at this point. Adding a comparatively small syntax with important control flow implications to a more-or-less do-nothing delimiter token seems contradictory to me. This would also be consistent with the mental model of a tree ofif/elseif._meaningx -> xinmatchseems similarly odd. In other places, this syntax means "do nothing" or "ignore this", but here it would mean "pass it on". I don't think overloading meaning on a single token like that is a good idea.- Does this enable any form of exhaustiveness checking/static analysis that usually comes with
match? Since the design doc mentions that "Falling through to the end of a match statement is a runtime error", I assume that this is not the case. - How does scoping work? In particular, how does it interact with closure capturing? Does each match arm introduce its own scope?
- What about matching property access, is this supported? What about indexing expressions?
Overall, it feels like the current state of this is mostly geared towards making error handling easier, rather than a general-purpose match expression. If that's the goal, why not add something like catch e::IOError(...) with pattern matching at that place instead, and allow multiple catch statements per try? That would be consistent with the existing mechanism, allows bubbling up unhandled things by default and keep the amount of easy-to-miss control flow to a minimum.
I'm probably having a slow start to my day today, but what does this mean? Are you saying that the implicit closure function from the generator is no longer in the call tree, or that
mapis no longer on the call tree..?
I'm saying that in general, we cannot be guaranteed that the function that creates a closure is on the call stack, so we cannot necessarily return from it, the semantics needs to make sense independently.
match? foo() feels really weird - why not match foo()?
match foo()? means something different - it matches on the returned value while propagating the exception if there is one
If I understand correctly, that would also remove the need for end?
It would not. As I said above end? is not a special case. It's a postfix ? on the match? expression - It turns the failure to match from a thrown exception into a declared one.
_ meaning x -> x in match seems similarly odd. In other places, this syntax means "do nothing" or "ignore this", but here it would mean "pass it on". I don't think overloading meaning on a single token like that is a good idea.
It suppresses the exceptional default case. It avoids having to invent a name just for this purpose.
Does this enable any form of exhaustiveness checking/static analysis that usually comes with match? Since the design doc mentions that "Falling through to the end of a match statement is a runtime error", I assume that this is not the case.
It could if somebody wanted to add additional static analysis to julia, but this is not a design objective at this point
How does scoping work? In particular, how does it interact with closure capturing? Does each match arm introduce its own scope?
Works like let on the captures
What about matching property access, is this supported? What about indexing expressions?
The syntax does not care and is extensible. As for defaults provided in Base, the natural extensions to property destructuring are intended to be supported. I have not thought about indexing.
Overall, it feels like the current state of this is mostly geared towards making error handling easier, rather than a general-purpose match expression.
No, they are separate proposals, with match intended as a fully functional standalone feature. I just think they make more sense together, since otherwise there needs to be other syntax for properly handling exception cases, so might as well make it a proper independent language feature.
If that's the goal, why not add something like catch e::IOError(...) with pattern matching at that place instead, and allow multiple catch statements per try?
Because the whole point is that it's not catch. Also :: means type assert, we cannot just coopt it for pattern matching.
Here's my comments on the match proposal specifically. I think this is less great than the exception idea.
Most importantly, I think the proposal, as-is, provides very little value over an if-else chain. Therefore, the proposal's new syntax doesn't carry its own weight. Where the proposal gives value is by enabling improved destructuring, and so the additional control-flow mechanism, identical to the existing if-else, is not a meaningful new feature. That is, I don't know why I would use a match instead of an if-else, and so the feature seems like a pointless TIMTOWDI - but maybe I'm missing something?
In other languages, a match statement differentiates itself from an if chain by being exhaustive. This might be motivation to have this.
Some less important points of objection / questions:
-
The most obvious use case for a Julian match expression is to match the type of a variable. E.g. if we have
x::Union{A, B, C}, we would matchxto its three possible variants. The proposal does not seem to have syntax for doing this kind of match, which I think is a shame, given that I think it's the most important use case. -
In the proposal,
returnin a match arm returns from the arm into the function, not from the function. I understand doing this allows early return from a match arm usingreturn, which can simplify code in match arms. However, I think usingmatchto return from functions will be a major use case, so I don't think this syntax is worth it. The simplification we gain in match arms will be offset by the extra complication that users can't return out of the function from a match arm. This point is similar to my above point about?in closures. -
I'm lukewarm about with the default
_syntactic sugar forx -> x. This provides very little value - saving five keystrokes, and is a typo risk.
Some jumbled thoughts:
Declared exceptions
- I really like the concept
- My gut reaction to a lot of the syntax is pretty negative, but I'm not really sure if I have a better option than postfix
?. I really do worry about how obscure and weird this might make some library code though. - I dislike the name
Exceptfor the reasons already described above. I thinkmayis a good option, and I think in the case of
function f(x) may A
g(x) may B
end
we should treat this how we treat type assertions, i.e. first we apply the may B , and then we apply the may A at the very end, so in this case I think it'd essentially boil down to requiring that B <: A ?
Match
- I'm somewhat skeptical of the backwards compatibility idea with
match.
match is currently used as a funciton name for regex matching. In addition, it is a not-uncommon variable name. We should keep these working to the extent possible. As such, match is not a keyword if it occurs (without whitespace) before ( or = or as a single identifier in any other context.
I worry this might end up being a big fragile and lead to weird bugs.
Honestly, I kinda think that this should just be @match instead of match. The reason I think this is that for basically all julia keywords, the code immediately next to the keyword sometimes has special semantics, but typically if that keyword has an associated block of code, that code has 'normal' semantics (sometimes with some slight variations. struct is the one I can think of that strays the furthest away, and even it doesn't go very far).
This match statement has very different semantics and syntax from regular julia code all throughout the entire match block, which makes me think that a macro sigil is more appropriate. This unfortunately does put some annoying constraints, e.g. you can't use -> and if the way you want, and you'd need a begin before the arguments block, but those might be worthwhile constraints. Then again, maybe there's a world where we can use this as a justification for having a new class of macros where certain parsing rules can be modified (e.g. https://github.com/JuliaLang/julia/issues/36590)
Thank you for your extensive reply! I don't have time to respond to every point right now, but I'll get to it by the end of the week.
I'm saying that in general, we cannot be guaranteed that the function that creates a closure is on the call stack, so we cannot necessarily return from it, the semantics needs to make sense independently.
I think I understand now - do you mean an example like this:
foo(x)::ExceptBak = (bar(y)? for y in x)
a = foo([1,2,3])
for b in a
# do something
end
Where the return cannot be from foo because foo has already returned at that point?
Most importantly, I think the proposal, as-is, provides very little value over an if-else chain. Therefore, the proposal's new syntax doesn't carry its own weight. Where the proposal gives value is by enabling improved destructuring, and so the additional control-flow mechanism, identical to the existing if-else, is not a meaningful new feature. That is, I don't know why I would use a match instead of an if-else, and so the feature seems like a pointless TIMTOWDI - but maybe I'm missing something?
Honestly, I've always thought so as well, which is why I thought it didn't really make sense standalone, but in the couple days of using match on this branch, I've basically stopped writing if/else entirely, and just using match everywhere, so I'm actually more positive on it now as a standalone feature than I was before. Things like
return match x
::Float64 -> 20
::Float32 -> 12
x::Union{String, SubString{String}} -> sizeof(x)
x::Char -> ncodeunits(x)
x::Union{UInt64, UInt32} -> ndigits(x)
x::Union{Int64, Int32} -> ndigits(x) + (x < zero(x))
_ -> 8
end
vs
if x isa Float64
return 20
elseif x isa Float32
return 12
elseif x isa String || x isa SubString{String}
return sizeof(x)
elseif x isa Char
return ncodeunits(x)
elseif x isa UInt64 || x isa UInt32
return ndigits(x)
elseif x isa Int64 || x isa Int32
return ndigits(x) + (x < zero(x))
else
return 8
end
just feel so much nicer even without any destructuring.
- The most obvious use case for a Julian match expression is to match the type of a variable. E.g. if we have
x::Union{A, B, C}, we would matchxto its three possible variants. The proposal does not seem to have syntax for doing this kind of match, which I think is a shame, given that I think it's the most important use case.
It does, see above.
- In the proposal,
returnin a match arm returns from the arm into the function, not from the function. I understand doing this allows early return from a match arm usingreturn, which can simplify code in match arms. However, I think usingmatchto return from functions will be a major use case, so I don't think this syntax is worth it. The simplification we gain in match arms will be offset by the extra complication that users can't return out of the function from a match arm.
I'm thinking about using break x instead, then return is available for function return. I do think we would need the multi-break syntax as well if we do that though.
I'm lukewarm about with the default _ syntactic sugar for x -> x. This provides very little value - saving five keystrokes, and is a typo risk.
This wasn't originally part of the proposal, but was added based on the experience of using it and writing this default case all over the place, which felt very annoying. Not set on it, but I did find it a nice ease-of-use improvement.
we should treat this how we treat type assertions, i.e. first we apply the
may B, and then we apply themay Aat the very end, so in this case I think it'd essentially boil down to requiring thatB <: A?
Yes
I worry this might end up being a big fragile and lead to weird bugs.
Possibly yes, but two points:
- This will be syntax versioned, so at least it won't be breaking
- Contextual keywords are not new in julia:
julia> where(x) = x
where (generic function with 1 method)
julia> where(2)
2
though of course, the difference here is that there is an existing match generic function in base. That said, I don't think it's actually that big a problem, because the allowable match is currently a syntax error in all cases (because you can't just put an identifier next to another expression).
Honestly, I kinda think that this should just be @match instead of match.
I think if we start treating it seriously and using it everywhere in base, it deserves to be syntax. You can certainly make a macro version work (which is what several packages already do), but the constraints of existing syntax make it look quite clunky in several cases.
but typically if that keyword has an associated
blockof code, that code has 'normal' semantics
The rhs of match arms all have normal semantics. Think of this as the identifier list in:
let a = <code>,
b = <code>
Where the return cannot be from
foobecausefoohas already returned at that point?
Yes
I'm thinking about using break x instead, then return is available for function return. I do think we would need the multi-break syntax as well if we do that though.
this seems like a quite nice idea to me
thoughts on => instead of -> ?
Regarding the match proposal, having return not return from the function seems pretty confusing and can cause bugs when refactoring. Since nothing else works like that (for/if/etc). Could we have "break with value" instead of "return"?
val = match (a, b)
(1, n) -> begin
if n == 2
break 1
end
2
end
_ -> 3
end
Or just skip it and require folks write an expression or factor out a function if they want to exit the match without exiting the function its in:
# use an expression
val = match (a, b)
(1, n) -> begin
n == 2 ? 1 : 2
end
_ -> 3
end
# or use a helper
function helper(n)
if n == 2
return 1
end
return 2
end
val = match (a, b)
(1, n) -> helper(n)
_ -> 3
end
edit: I see break with value was just proposed, I missed that!
thoughts on
=>instead of->?
I think this makes sense if it uses break rather than return, but would break pattern matching on pairs, which may be desirable. Since there's no pattern matching on lambdas, -> is more available syntactically.
Since nothing else works like that (for/if/etc)
do works this way.
Or just skip it and require folks write an expression or factor out a function if they want to exit the match without exiting the function its in:
The facility for return was based on experiences with the macro match packages, which tend to end up adding some way to do this, so I think the users want it. As I said, not partial to return though.
but would break pattern matching on pairs
it wouldn't "break" so much as "be super confusing" I guess? since right-associativity of => means the first one left-to-right is for the match, so matching on a pair would require parentheses
it wouldn't "break" so much as "be super confusing" I guess? since right-associativity of
=>means the first one left-to-right is for thematch, so matching on a pair would require parentheses
Sure, maybe that's not so bad:
match first(pairs(dict))
(1 => 2) => "special"
(k::Int => v) => k + v
end
Pattern matching features
MLStyle.jl's pattern matching by @thautwarm & @Roger-luo has nice features that I would like to have in built-in pattern-matching syntax.
A few in particular come to mind:
quote patterns
julia> @match 2 begin
$(1 + 1) => "two"
end
"two"
c = ...
@match (x, y) begin
(&c, _) => "x equals to c!"
(_, &c) => "y equals to c!"
_ => "none of x and y equal to c"
end
# 1-ary deconstruction: return Union{Some{T}, Nothing}
@active LessThan0(x) begin
if x >= 0
nothing
else
Some(x)
end
end
@match -15 begin
LessThan0(a) => a
_ => 0
end # -15
I would also like exhaustive matching for enums, with footgun protection:
@enum FRUIT Apple Banana Pear
@match f begin
apple => 1
Banana => 2
Pear => 3
end
Mistyping Apple as apple accidentally matches and binds the new identifier apple instead of matching Apple, so the Banana and Pear branches are dead code. This mistake could be detected and prevented, ideally statically.