prql icon indicating copy to clipboard operation
prql copied to clipboard

Syntax to break an expression over multiple lines?

Open ajfriend opened this issue 2 years ago • 19 comments

The PRQL syntax currently usually (with the exception of lists?) treats newlines as terminating characters for statements. (And please do correct me if I'm wrong about this!)

For long expressions (and perhaps also statements?), it would be nice to have a syntax that would allow users to break that expression up over multiple lines.

This idea is motivated by the discussion in https://github.com/prql/prql/issues/788.

Is there a current way to do this? If not, would parentheses be the natural way to express this?

ajfriend avatar Jul 17 '22 03:07 ajfriend

This is a good point.

I don't think we should need it that often — one of the benefits of PRQL should be to split up logic into modular pieces, represented by transforms on separate lines.

But it would be presumptuous to assume we can always do this, and #788 is a good case — even if the "split up into multiple filters" suggestion is reasonable, it doesn't cover the case of a long list of or-ed conditions.

Newlines have an important semantic meaning, so we've "used up" that syntax. So a couple of options:

  • Use \ to allow line breaks
  • Find the cases that force long lines and offer a solution to them (e.g. an any function to take a list of conditions and string them together with ors)

Any thoughts?

max-sixty avatar Jul 18 '22 15:07 max-sixty

My comment around parentheses was wondering if you could do something like what Python does to "free up" the newline syntax inside of parentheses: Screen Shot 2022-07-18 at 10 50 55 AM

ajfriend avatar Jul 18 '22 17:07 ajfriend

\ also seems like a decent solution to me

mklopets avatar Jul 18 '22 21:07 mklopets

Yes, interesting. So we could have a rule that finishing the line on an operator (e.g. + / or / etc) causes a line continuation.

Are there any instances where we want to finish on an operator?

One concern is that while editing a line in the middle of a query, the error messages might be quite confusing:

from foo
derive bar = baz or<cursor>
aggregate (
  max fuz
)

...would this have a confusing error message saying "baz or aggregate is invalid"?

max-sixty avatar Jul 19 '22 04:07 max-sixty

Yes, interesting. So we could have a rule that finishing the line on an operator (e.g. + / or / etc) causes a line continuation.

But that's maybe valid only inside a set of parens? (I'm not sure what the full Python parsing logic is.) Additionally, it might also be valid if the next line starts with an operator, which is that last Python example (inside parens).

ajfriend avatar Jul 19 '22 04:07 ajfriend

...would this have a confusing error message saying "baz or aggregate is invalid"?

"Unexpected newline"?

ajfriend avatar Jul 19 '22 04:07 ajfriend

"Unexpected newline"?

The challenge is that it wouldn't be parsed as a newline, it would be parsed as derive bar = baz or aggregate (, unless I'm misunderstanding something?

But that's maybe valid only inside a set of parens?

Yes, this could be possible. It starts to get a bit complicated — copy something from inside parens to outside parens and it doesn't work any longer.

But python is good prior art, I think this should be OK.

Additionally, it might also be valid if the next line starts with an operator, which is that last Python example (inside parens).

Yes, good idea. We'll have to see whether pest supports this, I think it should be possible with positive predicates

max-sixty avatar Jul 19 '22 06:07 max-sixty

Cool! And to be clear, I'm just offering my random thoughts as a fan in the peanut gallery. Appreciate all the work you've already done!

ajfriend avatar Jul 19 '22 06:07 ajfriend

Cool! And to be clear, I'm just offering my random thoughts as a fan in the peanut gallery. Appreciate all the work you've already done!

Thoughts generally welcome, and in the specific case they're also great ideas @ajfriend !

max-sixty avatar Jul 19 '22 07:07 max-sixty

I'd argue against trying to parse newlines after operators or inside parenthesis. I like \ far more, regardless of the fact that I find it quite ugly.

That's because newline (as a token) is already "used up" on higher levels of parsing, as @max-sixty said. Precedence goes roughly like:

newline, |
and, or
<, >, <=, >=
+, -
*, /
(), .., []

Simplified, tokens are first split by newlines and |, then by and and or, and so on.

Now, if we would want to ignore newlines in () or after +, we'd have to parse () and + higher in precedence list, which would add a lot of complexity.

But why can python support this? Python has two precedence lists: one for expressions and one for statements. Newline is a part of statement precedence list and it means "end of statement". And because statements cannot be part of expressions, and because ( indicates unfinished expression, Python can only use expression precedence list and resolve newline to mean "whitespace".

Back to PRQL, newlines can be part of expressions - actually the whole pipeline is an expression! That's why even in (), newlines must mean "pipe".

I came up with another idea: newlines meaning "pipe" make sense only in pipelines. Pipelines can be just top-level plain function calls, or function calls wrapped in (). But they cannot be function calls in []! Which means that without much hussle, I think we can make this work:

derive [
  some + long + expr + 
    spans + multiple + lines
]

or

derive [some + long + expr + 
    spans + multiple + lines]

aljazerzen avatar Aug 04 '22 07:08 aljazerzen

I have a question, not about future multiline syntax, but about a potential workaround: perhaps in Prql 0.3.1, we can use currying to add arguments one at a time. Perhaps not.

This is not meant as a derailing feature request, only as a "can this workaround be made to work" query. If it can't, no worries.

Alright, the would-be workaround:

# This function takes three arguments, so maybe you want to write it on multiple lines
func addthree<column> a b c -> s"a + b + c"

# 'arg x' takes a curried function and adds x to the end of its argument list.
# addthree a | arg b  -> addthree a b
# because newlines are equal to pipes, this allows building up a function's arguments over multiple lines
func arg myarg myfunc -> ( myfunc myarg )

from mytable
derive [
  my_thing = (  # a multiline pipe that builds up a function call
    addthree apples  # addthree apples _ _
    arg bananas  # addthree apples bananas _
    arg citrus  # `addthree apples banana citrus`, a.k.a. `apples + bananas + citrus`
  )
]

Compiling this with 0.3.1 gives an error (below) that I think I can summarize as "I expected a column expression, but I found a function with signature infer -> infer". Is it possible to write the code differently so it works in 0.3.1? If not, no matter. Thanks for all you've done and are doing on PRQL!

# Linebreaks added. Character span not accurate.
Error {
  span: Some(span-chars-110-176),
  reason: Expected {
    who: Some("function std.derive, param `columns`"),
    expected: "type `column`",
    found: "type `func infer -> infer`"
  },
  help: None
}

sietse avatar Dec 23 '22 16:12 sietse

Woah, a great workaround using currying! Maybe it was worth implementing :smile:

Your example should work and the fact that it doesn't is a bug. I'll look into it. There seems to be a problem with pipelines, because normal function calls do work (but defeats the whole purpose of the workaround):

func addthree<column> a b c -> s"{a} + {b} + {c}"

func arg myarg myfunc -> ( myfunc myarg )

from mytable
derive [
  my_thing = (
    arg "citrus" (arg "bananas" (addthree "apples"))
  )
]

aljazerzen avatar Dec 23 '22 16:12 aljazerzen

Cancel that, even my example produces a wrong output.

aljazerzen avatar Dec 23 '22 18:12 aljazerzen

Well, thank you very much for taking a look. PRQL is a beautiful thing. Have a good slide into the new year!

sietse avatar Dec 29 '22 15:12 sietse

A thought about backslash \ for line continuations: it can interact badly with line comments.

  • In languages where \ at the end of a line indicates a line continuation ...
  • ... and '#' comments all characters until the end of the line ...
  • ... in those languages, you can't have line comments on multiline expressions.

For example, this is invalid bash:

tar \
  -x  # extract \
  -z  # gzipped \
  -f  # read from file \
  myfile.tar.gz

and so is this:

mycommand \
  | head --bytes=-2  # remove spurious CR LF from end \
  | tail +2  # remove header 

Some potentional ways to allow comments AND line continuations, if that is important to you.

Contrived example featuring addthree. Find shows with free places, by adding up how many places each kind of ticket represents, and then comparing that to the theatre's capacity.

Flavour 1: 'backslash EOL' for line continuation. 'backslash space comment EOL' also for line continuation. 'EOL' for pipe.

from tickets_per_show
filter addthree \
    seats \
    couches * 2 \   # couches are double-sized
    standing / 2 \  # standing people take less room
    != max_capacity

Flavour 2: 'EOL space backslash' for line continuation. 'EOL' for pipe.

In other words, line that starts with spaces + backslash gets glued to previous line, after comment removal.

from tickets_per_show
filter addthree
    \ seats
    \ couches * 2   # couches are double-sized
    \ standing / 2  # standing people take less room
    \ != max_capacity

Flavour 3: infix comments

from tickets_per_show
filter addthree \
    seats \
    couches * 2   /* couches are double-sized */ \
    standing / 2  /* standing people take less room */ \
    != max_capacity

Flavour 4: line continuations allow separate-line comments, not end-of-line comments,

from tickets_per_show
filter addthree \
    seats \
    # couches are double-sized
    couches * 2   \
    # standing people take less room
    standing / 2  \
    != max_capacity

sietse avatar Dec 29 '22 16:12 sietse

Good breakdown of options for the backslash escaping route. I prefer flavour 4 the most.

But as I said, backslash looks ugly (that's just my preference) and has to be adjusted when adding a new line at the end. It's like a trailing comma. And we cannot allow trailing backslashes...

aljazerzen avatar Dec 30 '22 09:12 aljazerzen

I agree for Flavour 4! (I think the example is missing +s — is that right? Or I'm misunderstanding the example.)

Overall I'd be fine with this; it doesn't conflict with other syntax. I think we should try and reduce the amount it's needed — but sometimes it will be, as the example shows.

And agree with @aljazerzen on this point; specifically that it's not "composable", in the same way as a trailing comma isn't:

has to be adjusted when adding a new line at the end. It's like a trailing comma. And we cannot allow trailing backslashes...

If I can suggest something to the design — trailing spaces after the backslash should be ignored. In bash, \ does not count as a trailing backslash, which can be really confusing.

max-sixty avatar Dec 31 '22 00:12 max-sixty

As discussed on Discord — are we broadly in agreement on this option?

Flavour 4: line continuations allow separate-line comments, not end-of-line comments,

from tickets_per_show
filter addthree \
    seats \
    # couches are double-sized
    couches * 2   \
    # standing people take less room
    standing / 2  \
    != max_capacity

Note:

(I think the example is missing +s — is that right? Or I'm misunderstanding the example.)

max-sixty avatar Jan 09 '23 07:01 max-sixty

This does seem like the best option. Nevertheless, I'd prefer not add, for the reasons above.

I'm -0.5 on it.

aljazerzen avatar Jan 09 '23 07:01 aljazerzen

How would you represent the a long expression @aljazerzen ? I agree the trailing backslashes are ugly! But is there a better way?

FWIW this is longer the case:

But they cannot be function calls in []! Which means that without much hassle, I think we can make this work:

max-sixty avatar Jan 09 '23 18:01 max-sixty

Within lists, we could make this work:

from a
select [
  sum
  b,
  avg
  b
]

... and an idea: non-pipelining parenthesis!

from a
select ((
  sum
  b
))

aljazerzen avatar Jan 09 '23 19:01 aljazerzen

... and an idea: non-pipelining parenthesis!

The abyss just stared back at me! (niche joke / reference!)

Within lists, we could make this work:

How about for a long expression with operators? I think that's the most likely long expression, rather than function calls. As above, we could allow ending a line with those (though it may create some odd error messages)?

I think that's possible. I'm not sure whether that's better than the trailing slashes.


I'm also OK saying "the current state isn't great. But the options we're looking at aren't good either, we're going to leave this open for a while and reflect, rather than implementing something bad", if we think that reflection might pay off.

Or "we're OK with X, but it's not a priority to implement for us, we'll accept a PR", which is an implicit filter without us getting in the way of anyone.

I think both of those are better than leaving something hanging for ages (as you pointed out on Discord / on the call!)

max-sixty avatar Jan 09 '23 19:01 max-sixty

FWIW, my favourite is probably Flavour 1 and I'm not such a fan of Flavour 4 - it just seems harder to read to me (but I'm just speaking as a user, not an implementer).

Wouldn't Flavour 2 possibly address the "trailing comma" type issue as only the first line is different and that's special anyway. It wouldn't be my first choice as you can't tell on each line that there is more coming but to me it seems acceptable if it deals with the trailing commas issue. How would this flavour deal with commenting things out though?

Would the following work?

from tickets_per_show
filter addthree
    \ seats
    # \ couches * 2   # couches are double-sized
    \ couches * 3   # couches are triple-sized
    \ standing / 2  # standing people take less room
    \ != max_capacity

@max-sixty 's suggestion of "leave this open and accept a PR" seems good for now.

snth avatar Jan 09 '23 20:01 snth

Would the following work?

from tickets_per_show
filter addthree
    \ seats
    # \ couches * 2   # couches are double-sized
    \ couches * 3   # couches are triple-sized
    \ standing / 2  # standing people take less room
    \ != max_capacity

This actually might work! It makes the parser ambiguous for one item, but that's OK I think.

Or are there things I'm missing?

~(I'm going to take the executive decision of filling in the +, which I'm fairly convinced are missing in error and make the examples wrong, since no one corrected me above)~

max-sixty avatar Jan 09 '23 22:01 max-sixty

@max-sixty The reason there are no +s is because of the addthree function that was defined higher up (here).

snth avatar Jan 09 '23 22:01 snth

snth is right. I felt that a long expression like seats + couches + standing risked a sidetrack into "a line ending in + is obviously incomplete" territory, hence why I made up addthree seats couches standing for the examples.

sietse avatar Jan 10 '23 10:01 sietse