prql
prql copied to clipboard
feat: Remove `func` keyword
~I'm super confused by this. The only change I made is to comment out the lexing of "func". But as a result, a huge number of tests fail; most queries can no longer be compiled. I don't see how that's possible — the AST that's being passed back from prql-parser surely doesn't change. (As expected, the only test in prql-parser fails is one that contained func, which I've commented out — which suggests that the AST likely doesn't change). Am I missing something very obvious? Is there dark magic in our codebase?~
OK it was because stdlib was using it!
What do folks think about removing func now that we have line continuations? Exactly one way to do things etc etc...
Removing dependence on a keyword for multi-line declaration is cool, but there was another motive for func keyword.
I've re-added func keyword because there were cases where parsing was very close to being ambiguous. I was even thinking about making it required.
A case of ambiguity with param defaults:
let my_func = a b c:x -> x -> a + c b
A case of ambiguity with returning a function:
let my_func = a -> x -> a + x
The general problem (and potential performance sink) is that when the parser encounters a func call a b c, it cannot know if it is actually a func call, or a func definition.
Because of precedence, it first has to assume that it is a func definition and only when it does not find a trailing ->, it backtracks and parsers a b c as expressions, composing a function call.
Could we instead make it right-associative?
e.g. with:
let f = x y -> x z -> x + y + z
If -> is right associative, then this would be parsed as:
let f = x y -> (x z -> x + y + z)
If it's left associative, it would be parsed as:
let f = (x y -> x) z -> x + y + z
...which seems wrong.
To take your examples:
let my_func = a b c:x -> (x -> a + c b) # this applies with no parentheses too
vs.
let my_func = a b -> (c:x -> a + c b)
and
let my_func = a -> x -> a + x # this applies with no parentheses too
vs.
let my_func = a -> (x -> a + x)
?
Oh, we could absolutely make it left or right associative (one of these is the current behavior), but my point is that parsing this is much harder and less performant than it needs to be.
This applies to all function calls: when parsing join albums side:left (==album_id), the parser has to traverse all the way until ( before it can conclude that this is not a function declaration.
I'm all for minimal grammar, but I do think having a func keyword would benefit performance and ambiguity.
(even with the keyword, we still need to decide on left/right-associativity)
I was less keen on the func keyword, but not immovably so.
I agree it slows down the parsing a bit. But the parsing is still pretty fast! And at least how I understand it, it's not ambiguous to humans? Though possibly that's because we so rarely see nested function definitions.
I also really like the clean aesthetics of the current form, and only having one form of function definition. Though aesthetics aren't that good a reason, and the latter point would hold if we mandated func...
Presumably a language such as Haskell is similar? i.e. f x y = x + y could be lots of things until the =?
Presumably a language such as Haskell is similar? i.e. f x y = x + y could be lots of things until the =?
I don't know haskell, but I don't think that's the case. Aren't all terms before = always function name and its args?
Aren't all terms before
=always function name and its args?
Yes but same as with PRQL — until we get to the = (or it our case, the ->) — it could be lots of things.
In particular f could be a function being applied to two arguments, x and y?
I've looked into Haskell and it does not have this problem.
That's because f x y = ... cannot be an expression, only a statement. This means that in this statement:
a = f x y
... f x y cannot be a start of a function declaration, no matter what we add onto the line.
For anonymous functions, Haskell does have a leading denomination:
a = \x y -> x + y
I know that this might seem like a detail, but I can have major performance implications. An extreme example would be:
let x = a b:(1 + 2 + ...) c:(1 + 2 + ...) d:(1 + 2 + ...) ... z:(1 + 2 + ...)
Here, parser would first try to parse this as a function declaration - all of the parameters and all of their default values, only to encounter an end of the line, conclude that this is not a function declaration, throw this all away and start again, parsing as a function call.
I wish we could reuse the parsed args, but they do a slightly different syntax (type annotations), so this would be hard to pull off.
That's because
f x y = ...cannot be an expression, only a statement.
until we get to the
=(or it our case, the->) — it could be lots of things.
But f x y can be lots of things, no? Before we get to the =?
In Haskell, I don't think so, no. (I have a very limited knowledge of Haskell, correct me if I'm wrong).
If I understand correctly, a statement has to start with variable / function name, followed by parameters, followed by =.
What do folks think about removing
funcnow that we have line continuations? Exactly one way to do things etc etc...
@max-sixty This discussion so far has been highly technical, targeted at people who're deeply familiar with PRQL innards. As someone who might wind up writing/tweaking documentation for this, I would enjoy more explanation:
- Where is
funcused now? I thought it had been replaced bylet... - What does this have to do with line continuations?
Update: A few sentences of context would be helpful. Thanks, as always.
If I understand correctly, a statement has to start with variable / function name, followed by parameters, followed by
=.
Sorry to belabor the point, but to ensure we're sharing understanding on this example. IIUC (also not an expert), this is valid Haskell:
f :: Int -> Int -> Int
f a b = a + b -- Function definition
...but so is...
x = 10
y = 20
f x y -- Function application
(even though that final statement isn't doing anything, it's still valid IIUC)
The point re before the = is important — i.e. once we get to the =, then we know it's a definition — in the same way once we get to the ->, we know it's a definition.
Does that make sense? (If I'm correct it doesn't mean that we have remove func obv! Instead I wanted to understand where we were vs. other languages.)
Where is
funcused now? I thought it had been replaced bylet...
It broadly has! But it's still valid, and allows multi-line function definitions (check out PR for examples). So now we have line wrapping, one proposal is to remove func. Another proposal is to bring func back, to make the parser simpler...