prql icon indicating copy to clipboard operation
prql copied to clipboard

Inconsistent behavior of a single item in `aggregate`

Open max-sixty opened this issue 3 years ago • 11 comments

Currently this works:

from employees
aggregate (sum salary)

...but not with an assignment...

from employees
aggregate (x = sum salary)

Instead, [ ] are required:

from employees
aggregate [x = sum salary]

max-sixty avatar Jun 24 '22 16:06 max-sixty

Well, this is syntax level issue: we allow assigns in lists but not in nested pipelines. It may be possible to allow an assign as a first expression in a pipeline?

I can see into that.

aljazerzen avatar Jun 24 '22 19:06 aljazerzen

Ah, good point.

We removed parenthesized_expr in favor of (...) always being an inline pipeline. I remember thinking that was a good change!

Though it does mean we now have this inconsistency. I'm not sure what the best thing to do is; do we allow for assigns in pipeline — that might make for some odd error messages.

Alternatively we could always ask for a list there. I rather liked the equivalence between a list and a single item; but it's probably worse to have it partly equivalent. WDYT?

max-sixty avatar Jun 24 '22 20:06 max-sixty

I don't think this is even inconsistently - lists and parenthesis are not on the same "level".

Lists pass multiple items into functions. These inner items can be parenthesized:

from employees
derive (sum salary)
derive [(sum salary)]

Equivalence for assigns:

from employees
derive x = (sum salary)

derive [x = (sum salary)]

We could add assigns in parenthesis but i'll have to think about the types of such expressions.

aljazerzen avatar Jun 25 '22 17:06 aljazerzen

I was originally envisaging that we could pass a single item where we could pass in a list, that those would be interchangeable.

Totally agree that they mean different things at the semantic level — so maybe that's an assumption we should drop...

max-sixty avatar Jun 26 '22 21:06 max-sixty

I think I should clarify the docs at https://prql-lang.org/book/examples/list-equivalence.html, and possibly we should think about whether the rules are clear here.

In particular, we get some confusing error messages if we forget to put parentheses around things that work in lists.

Atm the rules are something like:


Assignments need one of these to be true:

  • Rvalue in parentheses — derive x = (sum salary)
  • The assignment in a list — derive [x = sum salary]
  • Rvalue doesn't contain function calls — derive x = salary or derive x = salary ?? foo are valid

Further, any expression containing a pipe needs to be in parentheses — derive [foo | in 5..10] is not allowed and needs to be derive [(foo | in 5..10)]


Here's an example of the currently confusing error message:

from employees
filter foo | in 5..10  # requires parentheses
Error: 
   ╭─[:2:14]
   │
 2 │ filter foo | in 5..10
   ·              ────┬───  
   ·                  ╰───── table s-strings cannot contain interpolations
   · 
   · Help: are you missing `from` statement?
───╯

As above, I think this is quite complicated, and possibly we can try to make the rules simpler, even at the cost of reducing the scope.

max-sixty avatar Dec 20 '22 19:12 max-sixty

+1 for adding the explanation in the previous comment to the "List Equivalence" page in the book.

snth avatar Dec 20 '22 21:12 snth

Maybe rather than saying "args can also be lists" we should say "list args will interpret a non-list as a singleton". Or more academically: we have a implicit cast from any to list.

The way I think about where args are needed can be formulated as: "nested function calls need parenthesis/brackets".

I hope this helps.

aljazerzen avatar Dec 20 '22 21:12 aljazerzen

One thing we might be able to do is allow [foo | in 5..10] — that would mean we could say "an item in list has the same semantics as a parenthesized expression", rather than the additional difference between we currently have.

...though I'm not sure, I'd have to try it and see whether anything breaks.

max-sixty avatar Dec 21 '22 18:12 max-sixty

I see it works, nice generalization.

aljazerzen avatar Dec 22 '22 09:12 aljazerzen

One thing we might be able to do is allow [foo | in 5..10] — that would mean we could say "an item in list has the same semantics as a parenthesized expression", rather than the additional difference between we currently have.

I've done this.


I still think it's confusing that these are allowed:

select x = salary + benefits
# and
select [x = sum salary]

but this isn't:

select x = sum salary

I'm not sure there's much we can do though. We definitely can't allow

select x = salary | sum

...since it's unclear whether sum applies to salary or the whole pipeline.

And even without the pipe, as I wrote in https://github.com/prql/prql/pull/1319, I don't think it's possible to discriminate between:

  • the sum salary as a function in select x = sum salary
  • salaries [==id] as arguments to join in join s=salaries [==id] ...such that we could allow select x = sum salary

So the main options I can see:

  1. Status quo + adding docs about "every function needs to be in parentheses outside of a list", improve error messages (I think the error messages are fairly important, maybe it's a good thing for me to do)
  2. Force everything to be in a list, so select [x = sum salary] is required, eliminating the difference between lists and no-lists
  3. Force all function calls to be in parentheses, so select [x = sum salary] doesn't work and select [x = (sum salary)] is required; also eliminating the difference between lists and no-lists

max-sixty avatar Dec 22 '22 19:12 max-sixty

I'm:

  • +1 on docs & error messages,
  • -1 on removing list coercion,
  • +0.5 on requiring parentheses for function calls. It would be less ergonomic, but more consistent.

aljazerzen avatar Dec 22 '22 19:12 aljazerzen