Structural pattern matching / destructuring
Use cases for pattern matching:
-
In the
exceptclause of thetryform. Right now theexceptclause always catches all exceptions, but it is useful to capture-
A particular exception, like "grep exited with status 1".
-
A certain class of exceptions, like "all non-zero exits of external commands".
-
-
Potentially, a syntax for overloading functions. For instance in Haskell it is possible to define functions like this:
fac 0 = 1 fac n = n * fac (n-1)Pattern matching is done in the argument list.
-
A general pattern matching form (case expression in Haskell, switch in C-like languages).
Desirable features of pattern matching:
-
C-like switch-case on the value of a variable. POSIX sh has this functionality as the
casecommand:case $v in 0) do-something ;; 1) do-something-else ;; esac -
Discriminate between different types, and potentially extract information. This is akin to how pattern matching in functional languages like Haskell or Erlang works. To explain why this is useful in shells, I shall use the matching of exceptions as an example.
Right now non-zero exit status from external commands are turned into an exception with the text "some-command exited with some-status". Text are hacky to work with, so it is useful to turn that into structured data, schematically as
(exited $some-command $some-status), like(exited grep 1). Similarly, signals can be turned into(signalled $some-command $some-signal), like(signalled grep 15).Then in the
exceptclauses of atryform, we will want to discriminate these two cases, and extract the exit status or signal. This can work like this:try { grep some-text some-file } except (exited grep $status) { # $ status now contains the exit status } except (signalled grep $signal) { # $signal now contains the signal }
There is a concern regarding the syntax, which is particular to the shell. Traditionally, assignments in shells elide the dollar sigil of variables: you write var=value to assign to $var, not $var=value. However, in the mock syntax I have given above, I chose to retain the dollar sign in both $status and $signal, because otherwise we won't be able to distinguish things that are part of the pattern (exited, signalled and grep), and things that we want to be assigned ($status and $signal). Moreover, in pattern matching, the top-level node is not necessarily a variable, so dollar elision won't make sense.
This is fine as long as assignments and pattern matching are totally separate constructs. However, pattern matching is really a generalization of assignments, so this disparity can be disconcerting. For one thing, currently the except clause mimics assignment, so a catch-all captures looks like:
try {
...
} except e {
# do something with $e
}
With pattern matching, it will need to be
try {
...
} except $e {
# do something with $e
}
:-/
Probably the syntax of the pattern does not have to mirror constructs used to build values.
In languages with pattern matching, patterns can be not arbitrary expressions, but restricted to certain classes of "solvable" expressions. For instance, if you have a constructor for a simple struct like Circle(center, radius), you can match a value against that constructor and extract the center and the radius. In other words, the constructor is "solvable" -- given a value of type Circle, the language knows how to solve for center and radius. (Note: the word "constructor" is used a la Haskell, not constructors in e.g. Java that can do arbitrary initialization work.)
However, if Circle is a function, you cannot do that, because the language cannot automatically know the inverse of that function.
The fact that only a very restricted set of expressions (e.g. simple constructors) can be used as patterns means that the parallelism of patterns and expressions is, in a sense, only superficial. It will be OK if we do not choose to maintain this parallelism.
Languages with pattern matching, we can steal syntax and semantics from them:
-
Haskell (in official tutorial, in wikibook)
-
Rust (in the Rust book)
-
Perl 6's
~~operator is not pattern matching a la ML but serves a similar purpose (~~, ACCEPTS method)
The matching syntax should be able to express "or", like "(exited grep 1) or (exited grep 2)".
For another language you can look at for pattern matching, I suggest Elixir.
Being based on Erlang, it has fairly powerful pattern matching, which is extended in a few important ways
https://elixir-lang.org/getting-started/pattern-matching.html
Not mentioned in the getting started is "railway programming", use of the with macro to allow routing or early exit with return value.
Pattern matching is characteristic of elixir and erlang; functions commonly return tuples in the form {:ok, some_data} for successful returns, and {:error, some_error_data} for failures, as opposed to exceptions.
The most common use of pattern matching in Elixir is very similar to destructuring in JS; it allows an easy way to assign variables to values in a map/list:
x = %{foo: "foo", bar: "bar", baz: "baz", bork: "bork"}
%{foo: f, bar: b} = x
f
> "foo"
b
> "bar"
This last usage could be extremely useful in a shell, alleviating the need for many tools such as awk or cut
I recommend not implementing this feature. Despite being a, former, fan of Perl's use of regex's everywhere. The initial followup comment about matching exceptions is a good example why this is a bad idea. Using regex matching of exceptions in that manner is guaranteed to result in false positives and false negatives. If for no other reason than most people will not write sufficiently strict regexs for those situations.
I am a fan of regexs and tend to use them even where their use is problematic and a non-regex solution would be preferable. Nonetheless, please do not make regex's a core feature of Elvish.
@krader1961 Not that I disagree with you, but are you attacking a straw man here? I don't see a proposal to rely on regexes to filter exceptions; rather, I see a proposal to use the structure of a (future) exception object to pattern match on.
Still, the need to filter on exceptions using pattern matching goes away if there is a way for the exception handler to conditionally rethrow the exact exception that was raised in the first place. If we have that, I think this can wait at least until after 1.0, as this belongs to the “possibly nice to have” category, rather than being an essential language feature.
@krader1961 as @hanche said this issue is about pattern matching as found in Haskell, ML and Rust, which are based on the structure of composite values, not regex. The "name collision" is unfortunate but I'm not aware of an alternative name for this feature.
Gah! Must not comment late at night after smoking a joint :smile: Sorry for the noise. Yes, some clean syntax for seeing if a composite value meets specific criteria might be useful. On the other hand it's not obvious, at least from the examples in the second comment, that we need more than better introspection of exceptions. So that you can use a lambda such as { and (eq $e[reason] signaled) (eq $e[cmd] grep) (eq $e[signal] $signal) }?
@xiaq just throwing this out: if you are considering structural pattern matching, how far would we be from destructuring? It's one of the things from Clojure I miss in other languages: curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh
We already have a very small – and very handy – amount of destructuring: Namely, the @list notation for left hand sides and lambda lists / formal parameters. It would be nice if any new destructuring facility would match the intuitive feel of this. See also #584.
I've been toying with ideas for a more lightweight syntax for exception handling:
case ?(cmd) {
.ok => echo 'Success!'
.err e => echo 'Error: '$e
}
!(cmd) would capture not only the exception but also stdout:
case !(cmd) {
.ok output => echo 'Output: '$output
.err e => echo 'Error: '$e
}
This is not unlike what @paradox460 was saying above about Erlang:
functions commonly return tuples in the form
{:ok, some_data}for successful returns, and{:error, some_error_data}for failures, as opposed to exceptions.
But instead of tuples and symbols, I was thinking more in terms of sum types and Either.
I vaguely resonate with the feeling that pattern matching can be useful for error handling. However, outputs and errors are not mutually exclusive in Elvish, so modelling this as a Either is not appropriate.
@tfga @xiaq actually I like the approach V has taken. Namely to have both orthogonal concepts at the same time.
E.g. plain sum types (fn x() int|string|none { ... }) can contain none (why not, it's a legitimate result - sometime you just don't want to return anything but just return from the function) but can't contain anything error-related. As sum types in V always signify a useful value and never an alternative computing branch.
For an alternative computing branch (i.e. errors, exceptions, panics, ... you name it) V writes this: fn x() ?(int|string|none) { ... }. It's the same as the signature above with the addition of ? in front to force the programmer in compile time to handle the alternative branch (V uses x() or { print( err ) } syntax for that and or x()? as sugar for propagation upwards).
So it's really two orthogonal things - the value (which can be none/nil/...) and the alternative branch. Not like in many other languages which (ab)use nil/none/... as error indicator etc.