The `|>` operator can cause parsing errors where normal function calls do not
I have continued working on the list representation of mixed units and came upon this odd behavior. In my function, inside an if expression, calling a function with two arguments worked fine, but piping in one of the gave an error. This seems unintentional since I see no reason why it would make it ambiguous.
Below is the two function definitions and their output.
# works as expected
>>> fn unit_list_impl<A: Dim>(val: A, units: List<A>, acc: List<A>) -> List<A> =
if len(units) == 1 then
reverse(cons(val -> head(units), acc))
else
unit_list_impl(val - unit_val, tail(units), cons(unit_val, acc))
where unit_val: A =
if (len(units) > 0)
then trunc_in(head(units), val)
else error("Units list cannot be empty")
fn unit_list_impl<A: Dim>(val: A, units: List<A>, acc: List<A>) -> List<A> = if (
len(units) == 1) then reverse(cons(val ➞ head(units), acc)) else unit_list_impl(val
- unit_val, tail(units), cons(unit_val, acc))
where unit_val: A = if (len(units) > 0) then trunc_in(head(units), val) else er
ror("Units list cannot be empty")
# fails due to parseing errors
>>> fn unit_list_impl<A: Dim>(val: A, units: List<A>, acc: List<A>) -> List<A> =
if len(units) == 1 then
reverse(cons(val -> head(units), acc))
else
unit_list_impl(val - unit_val, tail(units), cons(unit_val, acc))
where unit_val: A =
if (len(units) > 0)
then val |> trunc_in(head(units))
else error("Units list cannot be empty")
error: while parsing
┌─ <input:74>:6:5
│
6 │ where unit_val: A =
│ ^^^^^ Expected local variable definition after where/and
error: while parsing
┌─ <input:74>:9:9
│
9 │ else error("Units list cannot be empty")
│ ^^^^ Expected one of: number, identifier, parenthesized expression, str
uct instantiation, list
I tested it in the online terminal (wasm) from firefox.
A smaller example with a similar problem:
>>> if true then 1 |> id else 0
error: while parsing
┌─ <input:11>:1:16
│
1 │ if true then 1 |> id else 0
│ ^^ Expected 'else' in if-then-else condition
The problem is that |> has lower precedence than if … then … else (https://numbat.dev/doc/operations.html). It can be fixed using parens:
>>> if true then (1 |> id) else 0
But maybe precedence should be changed to allow for this? I guess if we would do that, it would prevent us from doing things like
if … then … else … |> function
and we would have to put the conditional into parens:
(if … then … else …) |> function
which defeats the initial purpose of this operator (that you can always just add it at the end of a line to do an additional function call)
Unrelated: I'm excited about this unit_list feature!
The problem is that |> has lower precedence than if … then … else
Oh of course, I completely forgot about that. I'm not sure of which is better, but perhaps a terminating token to the if-else expression would solve some of the problem? My thinking is that the then clause is already delimited by the else, making it a logical unit to scope as if in parentheses. If the else clause could be delimited as well it would also have an easy to parse scope. It does bring other challenges but it would disambiguate a pipe after the whole statement as if it's after the delimiter it would apply to the whole if-else expression while if inside it applies to only that scope. My main argument against this approach would probably be changing if-else expressions would become ugly or reintroduce ambiguity. For good measure here is an example syntax for the idea:
if true
then 1.5m |> floor_in(m)
else 1.5m |> ceil_in(m)
end
# equivalent to
if (true)
then ( 1.5m |> floor_in(m) )
else ( 1.5m |> ceil_in(m) )
end
# the problem with chaining:
if i < 0
then "negative"
else
if i > 0
then "positive"
else "zero"
end
end
Unrelated: I'm excited about this unit_list feature!
I really liked the idea of it when I read about it, so I'm excited to try my hand at it!
I'm not sure of which is better, but perhaps a terminating token to the if-else expression would solve some of the problem? My thinking is that the then clause is already delimited by the else, making it a logical unit to scope as if in parentheses. If the else clause could be delimited as well it would also have an easy to parse scope. It does bring other challenges but it would disambiguate a pipe after the whole statement as if it's after the delimiter it would apply to the whole if-else expression while if inside it applies to only that scope.
I think that reasoning is correct.
if-else expressions would become ugly
Indeed :-/
# the problem with chaining:
what do you mean by that?
# the problem with chaining:what do you mean by that?
I meant that making a chain of if-else expression would no longer be possible without nesting them.
I've had an idea about this. Could the syntax of the if-else be extended? To my understanding the current syntax is something like the following.
IfElse = "if" Cond "then" Expr "else" Expr
What if chaining was an actual part of the syntax and all chains had to end with a terminating token. Something like the following.
IfElse = "if" Cond "then" Expr "else" Expr ; Regular if-then-else
IfElseChain = 1*IfElse "end" ; A chain of one or more if-then-else's terminated with a single "end" token
I'm thinking this allows for an optional end token for the normal case that could be used to make the parsing of the pipe operator unambiguous and making the nesting of if-then-else chains easier to determine. As far as I can see it would allow for the current syntax in all cases but enable additional uses without the need for parentheses or many ugly terminating tokens.
Disclaimer: I do not really understand the current implementation of the expressions. I have never used Rust and can't really gauge how big of an undertaking this would be, so if it is a larger change then I don't mind the parentheses that much, but would probably write a paragraph about it in the documentation.
Hey,
I agree with everything that has been said before. I think the end is a good idea and should also be applied to function definition 🤔
This would probably degrade the error messages quite a lot, though.
As long as the end is not mandatory, I think it would be hard to write a proper parser, and we would have to backtrack a lot.
Something that could help a bit IMO would be to introduce a new rule saying "if it's more than one line, then the end is required".
It means we could also write multi-line functions way more easily and make |> work at the beginning at the next line like in OCaml (which greatly increase readability IMO):
fn color_hex(color: Color) -> String =
"{color -> _color_to_scalar -> hex:>8}" |>
str_replace("0x", "") |>
str_replace(" ", "0") |>
str_append("#")
# vs
fn color_hex(color: Color) -> String =
"{color -> _color_to_scalar -> hex:>8}"
|> str_replace("0x", "")
|> str_replace(" ", "0")
|> str_append("#")
end
And it would probably also help a lot for the development of a formatter later.
But in the end, after parsing fn color_hex(color: Color) -> String = "{color -> _color_to_scalar -> hex:>8}" we still don't know if the expression is done or not.
Maybe we're still parsing a binop, maybe not, and that will generate bad error messages.
So, one additional proposition, maybe by enforcing another rule, we could know immediately what we're parsing: => If the expression is line-breaking on the first character, then it's multi-line
This means if you write;
fn color_hex(color: Color) -> String =
Then we know we're still parsing a function and will expect an end at some point.
If instead we parse:
fn color_hex(color: Color) -> String = "{color -> _color_to_scalar -> hex:>8}"
Then we move on and will fail to parse the next |>, saying we were expecting a value or something.
With a small cache saying the immediate previous thing we parsed was a function, I believe we could improve the error message of all binops easily by putting a lint saying "if the [op] was supposed to be part of the function, you must either move it on the same line or make the function a multiline function by inserting a newline right after the = sign".
This means we could also write;
let a =
[1, 2, 3, 4, 5]
|> map(pow)
|> sum
And in the end, the if/then/else it probably the strangest one 🤔
(* This one is good *)
if X then bidule else truc
(* This is also good *)
if X then
bidule
else
truc
end
(* This is clearly bad *)
if X then bidule (* error here, you started an expression and have to finish on the same line *)
else truc
end
(* Should this be good? The newline was inserted at the start of a newline *)
if X then bidule else
truc
end
(* I would say no, the newline should have been inserted right after the `then` *)
This feels a bit clunky, but at the same time, I think it works and lets us provide actual good error messages. Plus, it doesn't generate all the mess that indentation-based "semantic" introduces, and keeps the parser pretty simple.
I like most of what you propose, especially the multi line approach. I am not a fan of requiring an end keyword though. In my opinion it is ugly and clutters code when added in places without ambiguity. I would prefer it to be optional unless needed for disambiguation.
One way of handling this for functions would be to use a double newline and an end the same way for functions. We already use the double newline for this purpose in the editor, so formalizing it should not be a big deal I think. This would preserve backwards compatibility at the cost the extra complexity of optional/situational tokens.
Not sure I got everything. What you propose is basically that we keep everything as I said expect a double newline can replace an end?
This way, we can still use end when writing a one-liner, but can also write a double newline when doing a multiline function?
And we still keep the idea that a multiline function/let/if/then/else must start with a newline?
If we agree on that, then the proposition looks good to me. It remove all ambiguity in terms in the syntax and we always know what we're trying to parse without lookahead. I'll wait until David take a look at the proposition before starting any implementation, but I would love that! It also opens the way for a numbat formatted later on 👀 (something I already tried and gave up on a long time ago)
Not sure I got everything. What you propose is basically that we keep everything as I said expect a double newline can replace an
end? This way, we can still useendwhen writing a one-liner, but can also write a double newline when doing a multiline function?And we still keep the idea that a multiline function/let/if/then/else must start with a newline?
If we agree on that, then the proposition looks good to me. It remove all ambiguity in terms in the syntax and we always know what we're trying to parse without lookahead.
It sounds like you got everything, and I agree with how you explained it here.
I'll wait until David take a look at the proposition before starting any implementation, but I would love that!
A good idea, but I will look forward to seeing your PR. It's great that you're revisiting these older issues.
It also opens the way for a numbat formatted later on 👀 (something I already tried and gave up on a long time ago)
I should really get back to my attempt at this 😅 I have an implementation that is about 80% done, but already works for a subset of the syntax. Life got in the way before I finished it, but I should get back on it now.
Hey, I started the implementation, and I don't think the double newline thingy is that great after all 🤔
Or at least not everywhere where we could write an end like I said before.
Consider the following code:
fn element_at<A>(i: Scalar, xs: List<A>) -> A =
if i == 0
then head(xs)
else element_at(i - 1, tail(xs))
If I insert two newlines it'll close the if, and then I need to insert two extra newline to close the fn.
That's really ugly.
Plus, if I want to shove a where in my function:
fn element_at<A>(i: Scalar, xs: List<A>) -> A =
if at_start
then head(xs)
else element_at(i - 1, tail(xs))
where at_start = i == 0
This forces us to put a newline in a place where we may not want it. And it also forbids us from putting newlines in place where we would like to put some, because it would close the function / if.
Sooo, in the end, I don't really know what to do. I don't think the idea is so great. Either we stick to the end or another keyword.
Or we could introduce a more common symbol, like brackets, if you want to make anything multiline.
I was thinking only having functions work that way, not if-else. I do see how that makes it impossible to have empty lines or even comment lines inside functions. That isn't ideal, but isn't that already the case?
Sooo, in the end, I don't really know what to do. I don't think the idea is so great. Either we stick to the end or another keyword. Or we could introduce a more common symbol, like brackets, if you want to make anything multiline.
If we choose between the two, I think we should go with the end keyword. I don't remember where, but I think David has talked about not wanting brackets or similar for scopes like that.
That isn't ideal, but isn't that already the case?
No, currently there is no special rule around functions. You can put some newlines in some places, but not everywhere. This works for example:
fn truc() = if 3 == 3
then 3 else 5
print(truc())
That's what is bothering me the most, I would like to find one simple and clear rule we can follow everywhere (that is not indentation-based) but it's not obvious. Having a closing keyword seems like the best option to me but let's sleep on it a bit before committing on anything. Maybe we'll find something better.
TBH, I haven't yet thought through all of this, but I can very well imagine that having an "end" keyword might be the only reasonable solution if we don't want a whitespace-aware syntax. Would something like the following work?
We keep the simple = syntax for convenience reasons and because it looks similar to how you would write a function in physics/maths. But we only allow a single line for the body, optionally with a single newline after the equal sign:
fn f(x) = 2 x² + 1
fn g(y) =
if y > 0 then 1 else 0
And for more complicated functions, we require the use of curly braces. Maybe:
fn h(z) = {
if z > 0
then x²
else 0
}
fn f(x) = 2 x² + 1 fn g(y) = if y > 0 then 1 else 0fn h(z) = { if z > 0 then x² else 0 }
I’m not sure how well that would work with where clauses. I can't really find a logical place to put them in the multi line example. Other than that, I think it works, especially if it is generally applicable so ifs and similar can also be made to be multi line:
fn half_rounded(z) = {
if mod(z, 2) == 0
then x/2
else {
x
|> add(1)
|> div(2)
}
}
It does however change the look of numbat quite a lot I feel. I don't know know if that matters, but I think it is worth mentioning.