prql
prql copied to clipboard
Syntax to break an expression over multiple lines?
The PRQL syntax currently usually (with the exception of lists?) treats newlines as terminating characters for statements. (And please do correct me if I'm wrong about this!)
For long expressions (and perhaps also statements?), it would be nice to have a syntax that would allow users to break that expression up over multiple lines.
This idea is motivated by the discussion in https://github.com/prql/prql/issues/788.
Is there a current way to do this? If not, would parentheses be the natural way to express this?
This is a good point.
I don't think we should need it that often — one of the benefits of PRQL should be to split up logic into modular pieces, represented by transforms on separate lines.
But it would be presumptuous to assume we can always do this, and #788 is a good case — even if the "split up into multiple filters" suggestion is reasonable, it doesn't cover the case of a long list of or
-ed conditions.
Newlines have an important semantic meaning, so we've "used up" that syntax. So a couple of options:
- Use
\
to allow line breaks - Find the cases that force long lines and offer a solution to them (e.g. an
any
function to take a list of conditions and string them together withor
s)
Any thoughts?
My comment around parentheses was wondering if you could do something like what Python does to "free up" the newline syntax inside of parentheses:
\
also seems like a decent solution to me
Yes, interesting. So we could have a rule that finishing the line on an operator (e.g. +
/ or
/ etc) causes a line continuation.
Are there any instances where we want to finish on an operator?
One concern is that while editing a line in the middle of a query, the error messages might be quite confusing:
from foo
derive bar = baz or<cursor>
aggregate (
max fuz
)
...would this have a confusing error message saying "baz or aggregate
is invalid"?
Yes, interesting. So we could have a rule that finishing the line on an operator (e.g.
+
/or
/ etc) causes a line continuation.
But that's maybe valid only inside a set of parens? (I'm not sure what the full Python parsing logic is.) Additionally, it might also be valid if the next line starts with an operator, which is that last Python example (inside parens).
...would this have a confusing error message saying "
baz or aggregate
is invalid"?
"Unexpected newline"?
"Unexpected newline"?
The challenge is that it wouldn't be parsed as a newline, it would be parsed as derive bar = baz or aggregate (
, unless I'm misunderstanding something?
But that's maybe valid only inside a set of parens?
Yes, this could be possible. It starts to get a bit complicated — copy something from inside parens to outside parens and it doesn't work any longer.
But python is good prior art, I think this should be OK.
Additionally, it might also be valid if the next line starts with an operator, which is that last Python example (inside parens).
Yes, good idea. We'll have to see whether pest supports this, I think it should be possible with positive predicates
Cool! And to be clear, I'm just offering my random thoughts as a fan in the peanut gallery. Appreciate all the work you've already done!
Cool! And to be clear, I'm just offering my random thoughts as a fan in the peanut gallery. Appreciate all the work you've already done!
Thoughts generally welcome, and in the specific case they're also great ideas @ajfriend !
I'd argue against trying to parse newlines after operators or inside parenthesis. I like \
far more, regardless of the fact that I find it quite ugly.
That's because newline (as a token) is already "used up" on higher levels of parsing, as @max-sixty said. Precedence goes roughly like:
newline, |
and, or
<, >, <=, >=
+, -
*, /
(), .., []
Simplified, tokens are first split by newline
s and |
, then by and
and or
, and so on.
Now, if we would want to ignore newlines in ()
or after +
, we'd have to parse ()
and +
higher in precedence list, which would add a lot of complexity.
But why can python support this? Python has two precedence lists: one for expressions and one for statements. Newline is a part of statement precedence list and it means "end of statement". And because statements cannot be part of expressions, and because (
indicates unfinished expression, Python can only use expression precedence list and resolve newline to mean "whitespace".
Back to PRQL, newlines can be part of expressions - actually the whole pipeline is an expression! That's why even in ()
, newlines must mean "pipe".
I came up with another idea: newlines meaning "pipe" make sense only in pipelines. Pipelines can be just top-level plain function calls, or function calls wrapped in ()
. But they cannot be function calls in []
! Which means that without much hussle, I think we can make this work:
derive [
some + long + expr +
spans + multiple + lines
]
or
derive [some + long + expr +
spans + multiple + lines]
I have a question, not about future multiline syntax, but about a potential workaround: perhaps in Prql 0.3.1, we can use currying to add arguments one at a time. Perhaps not.
This is not meant as a derailing feature request, only as a "can this workaround be made to work" query. If it can't, no worries.
Alright, the would-be workaround:
# This function takes three arguments, so maybe you want to write it on multiple lines
func addthree<column> a b c -> s"a + b + c"
# 'arg x' takes a curried function and adds x to the end of its argument list.
# addthree a | arg b -> addthree a b
# because newlines are equal to pipes, this allows building up a function's arguments over multiple lines
func arg myarg myfunc -> ( myfunc myarg )
from mytable
derive [
my_thing = ( # a multiline pipe that builds up a function call
addthree apples # addthree apples _ _
arg bananas # addthree apples bananas _
arg citrus # `addthree apples banana citrus`, a.k.a. `apples + bananas + citrus`
)
]
Compiling this with 0.3.1 gives an error (below) that I think I can summarize as "I expected a column expression, but I found a function with signature infer -> infer
". Is it possible to write the code differently so it works in 0.3.1? If not, no matter. Thanks for all you've done and are doing on PRQL!
# Linebreaks added. Character span not accurate.
Error {
span: Some(span-chars-110-176),
reason: Expected {
who: Some("function std.derive, param `columns`"),
expected: "type `column`",
found: "type `func infer -> infer`"
},
help: None
}
Woah, a great workaround using currying! Maybe it was worth implementing :smile:
Your example should work and the fact that it doesn't is a bug. I'll look into it. There seems to be a problem with pipelines, because normal function calls do work (but defeats the whole purpose of the workaround):
func addthree<column> a b c -> s"{a} + {b} + {c}"
func arg myarg myfunc -> ( myfunc myarg )
from mytable
derive [
my_thing = (
arg "citrus" (arg "bananas" (addthree "apples"))
)
]
Cancel that, even my example produces a wrong output.
Well, thank you very much for taking a look. PRQL is a beautiful thing. Have a good slide into the new year!
A thought about backslash \
for line continuations: it can interact badly with line comments.
- In languages where
\
at the end of a line indicates a line continuation ... - ... and '#' comments all characters until the end of the line ...
- ... in those languages, you can't have line comments on multiline expressions.
For example, this is invalid bash:
tar \
-x # extract \
-z # gzipped \
-f # read from file \
myfile.tar.gz
and so is this:
mycommand \
| head --bytes=-2 # remove spurious CR LF from end \
| tail +2 # remove header
Some potentional ways to allow comments AND line continuations, if that is important to you.
Contrived example featuring addthree. Find shows with free places, by adding up how many places each kind of ticket represents, and then comparing that to the theatre's capacity.
Flavour 1: 'backslash EOL' for line continuation. 'backslash space comment EOL' also for line continuation. 'EOL' for pipe.
from tickets_per_show
filter addthree \
seats \
couches * 2 \ # couches are double-sized
standing / 2 \ # standing people take less room
!= max_capacity
Flavour 2: 'EOL space backslash' for line continuation. 'EOL' for pipe.
In other words, line that starts with spaces + backslash gets glued to previous line, after comment removal.
from tickets_per_show
filter addthree
\ seats
\ couches * 2 # couches are double-sized
\ standing / 2 # standing people take less room
\ != max_capacity
Flavour 3: infix comments
from tickets_per_show
filter addthree \
seats \
couches * 2 /* couches are double-sized */ \
standing / 2 /* standing people take less room */ \
!= max_capacity
Flavour 4: line continuations allow separate-line comments, not end-of-line comments,
from tickets_per_show
filter addthree \
seats \
# couches are double-sized
couches * 2 \
# standing people take less room
standing / 2 \
!= max_capacity
Good breakdown of options for the backslash escaping route. I prefer flavour 4 the most.
But as I said, backslash looks ugly (that's just my preference) and has to be adjusted when adding a new line at the end. It's like a trailing comma. And we cannot allow trailing backslashes...
I agree for Flavour 4! (I think the example is missing +
s — is that right? Or I'm misunderstanding the example.)
Overall I'd be fine with this; it doesn't conflict with other syntax. I think we should try and reduce the amount it's needed — but sometimes it will be, as the example shows.
And agree with @aljazerzen on this point; specifically that it's not "composable", in the same way as a trailing comma isn't:
has to be adjusted when adding a new line at the end. It's like a trailing comma. And we cannot allow trailing backslashes...
If I can suggest something to the design — trailing spaces after the backslash should be ignored. In bash, \
does not count as a trailing backslash, which can be really confusing.
As discussed on Discord — are we broadly in agreement on this option?
Flavour 4: line continuations allow separate-line comments, not end-of-line comments,
from tickets_per_show filter addthree \ seats \ # couches are double-sized couches * 2 \ # standing people take less room standing / 2 \ != max_capacity
Note:
(I think the example is missing
+
s — is that right? Or I'm misunderstanding the example.)
This does seem like the best option. Nevertheless, I'd prefer not add, for the reasons above.
I'm -0.5 on it.
How would you represent the a long expression @aljazerzen ? I agree the trailing backslashes are ugly! But is there a better way?
FWIW this is longer the case:
But they cannot be function calls in
[]
! Which means that without much hassle, I think we can make this work:
Within lists, we could make this work:
from a
select [
sum
b,
avg
b
]
... and an idea: non-pipelining parenthesis!
from a
select ((
sum
b
))
... and an idea: non-pipelining parenthesis!
The abyss just stared back at me! (niche joke / reference!)
Within lists, we could make this work:
How about for a long expression with operators? I think that's the most likely long expression, rather than function calls. As above, we could allow ending a line with those (though it may create some odd error messages)?
I think that's possible. I'm not sure whether that's better than the trailing slashes.
I'm also OK saying "the current state isn't great. But the options we're looking at aren't good either, we're going to leave this open for a while and reflect, rather than implementing something bad", if we think that reflection might pay off.
Or "we're OK with X, but it's not a priority to implement for us, we'll accept a PR", which is an implicit filter without us getting in the way of anyone.
I think both of those are better than leaving something hanging for ages (as you pointed out on Discord / on the call!)
FWIW, my favourite is probably Flavour 1 and I'm not such a fan of Flavour 4 - it just seems harder to read to me (but I'm just speaking as a user, not an implementer).
Wouldn't Flavour 2 possibly address the "trailing comma" type issue as only the first line is different and that's special anyway. It wouldn't be my first choice as you can't tell on each line that there is more coming but to me it seems acceptable if it deals with the trailing commas issue. How would this flavour deal with commenting things out though?
Would the following work?
from tickets_per_show
filter addthree
\ seats
# \ couches * 2 # couches are double-sized
\ couches * 3 # couches are triple-sized
\ standing / 2 # standing people take less room
\ != max_capacity
@max-sixty 's suggestion of "leave this open and accept a PR" seems good for now.
Would the following work?
from tickets_per_show filter addthree \ seats # \ couches * 2 # couches are double-sized \ couches * 3 # couches are triple-sized \ standing / 2 # standing people take less room \ != max_capacity
This actually might work! It makes the parser ambiguous for one item, but that's OK I think.
Or are there things I'm missing?
~(I'm going to take the executive decision of filling in the +
, which I'm fairly convinced are missing in error and make the examples wrong, since no one corrected me above)~
@max-sixty The reason there are no +
s is because of the addthree
function that was defined higher up (here).
snth is right. I felt that a long expression like seats + couches + standing
risked a sidetrack into "a line ending in + is obviously incomplete" territory, hence why I made up addthree seats couches standing
for the examples.