gleam
gleam copied to clipboard
Destructuring pattern matching syntax for strings
It would be useful to be able to pattern match on the prefix of a string in the same way that we can for bitstrings.
What could the syntax be? Erlang uses bit string syntax, Elixir uses bit string syntax or the <> operator.
Would we also support this syntax for strings construction? If so we may want to ensure the compile optimises multiple concatenations into a single Erlang bit string expression.
You could think of it as Gleam already having the syntax but not typing it correctly.
<<"hi":utf8, "there":utf8>> is a valid pattern, expression, and constant Utf8 String but gleam types it as a BitString. If the type checker knew that if every segment was a utf8 value then the whole thing was a String then additional syntax might not be needed.
That being said, I recognize this is clunky, a little of that could be solved by having a String set the type of a segment rather than it always defaulting to int:
<<"hi", "there">>
or
let a = "hi"
let b = "there"
<<a, b>>
Perhaps adding :string as a segment option a la <<"a":string>> would help the type checker and alleviate needing utf8 to be a sort of typeclass.
We could use the same syntax but it would be the first instance of overloading in the language. I think we would want to think carefully about how we design and implement this as it has big implications for the language. The big question I think is how does the compiler decide if this is supposed to return a string or a bitstring?
<<x:utf8, y:utf8>>
Unless we can answer this question well I think I would prefer another syntax like Elixir has.
A draw back of using pure concatenation is that you can't specify how long a prefix is in pattern matching. The prefix must be an exact value in patterns. x <> y is undecidable in patterns unless x is assumed to be a single character.
In the parsers I've written in Gleam I usually resort to matching integers off the front of a BitString because there's no low cost, convenient way to stream graphemes off the front of a String, something like this could be nice:
case some_long_string_input {
<<"{", rest:string>> -> ...
<<"}", rest:string>> -> ...
<<a:grapheme, rest:string>> -> ...
}
where string is a variable length String if in the tail position, otherwise is an exact string, and grapheme is a single utf8 grapheme which is what utf8 in pattern matches does now though with the BitString type
here's the equivalent with <>, also nice:
case some_long_string_input {
"{" <> rest -> ...
"}" <> rest -> ...
a <> rest ->...
}
case some_long_string_input {
"{" <> rest -> ...
a <> rest ->...
}
With a syntax like this I'm wondering how you express these things:
ais of a certain lengthais equal to some pre-defined variable
We could possible do something like this:
case some_long_string_input {
"{" <> rest -> ...
a <> rest if a == aleady_existing_string ->...
}
Here we would may to traverse the guard clause to find any comparisons against pattern defined variables in order to inline them? I think that Erlang doesn't permit the size to be specified in the guard clause like this, but I would need to check that.
case some_long_string_input {
"{" <> rest -> ...
a:4 <> rest ->...
}
I'm not hugely sold on this syntax.
express a is of a certain length
It would have to be a single character, or have some unit quantifier like you have an example of
similarly there's no match multiple elements of a list as its own list, you have to:
[a,b,c,d, ..rest]
express a is equal to some pre-defined variable
Gleam doesn't have value pinning in patterns yeah?
this won't do what some might expect:
let a = 5
case x {
a -> "five"
_ -> "other"
}
the first branch will always match because a is a capturing variable and as far as I understand there's no value pinning syntax. (because of this gleam may actually want to warn on variable shadowing)
It's worth noting that there is already String / BitString polymorphism:
case "hi" {
<<"hi":utf8>> -> "this currently works"
}
and so typing any BitString that would result in a valid String as a String has some precedent
fn main(x) {
case x {
<<"hi":utf8>> -> "this currently works"
}
}
Oh wow this is a huge bug. main gets inferred as fn(a) -> String, so the function can be called with anything at all, not just strings and bitstrings. This is unsound and will crash at runtime. I'm not sure how this happened but it is certainly not by design.
If we are to have overloading of the bit string syntax we need to design a set of egonomic and un-surprising rules for how the type inference algorithm decides whether it is a String or a BitString. It cannot be an interface or type class of some kind as Gleam does not have these.
so the function can be called with anything at all
oh word, I didn't notice any type compiles. I thought it was deliberate String/BitString overloading, which was informing most of my thoughts here.
Maybe it was, but something went wrong somewhere.
I'm not against overloading per-say (overloading the maths operators would probably be a welcome change for most people), we just need to think carefully about how it would work and how it compares to the alternatives.
I quite like the idea of using the same syntax as string interpolation -> https://github.com/gleam-lang/gleam/discussions/1086