prql
prql copied to clipboard
Table definition syntax
I was just writing the docs for tables — currently we're still using the = syntax. Do we want to keep that, or convert to function's new syntax of ->:
-table a = (
+table a -> (
from employees
take 50
aggregate [s"count(*)"]
)
from a
@aljazerzen what are your thoughts on this pre-0.2?
My opinion is that it makes more sense to have = here.
But i think that eventually, we won't need table keyword because this will be possible to achieve with a function that takes no arguments and returns a table. Then, we will have -> here.
So it would be more consistent to change table syntax to ->...
In short, I dont have a strong opinion and even think that it will not matter in the future.
This issue getting pinged reminded me I actually wanted to change the syntax as:
-table a = (
+let a = (
from employees
take 50
aggregate [s"count(*)"]
)
from a
It signals that you could assign any value to the variable, which would be useful in some cases, like declaring constants.
It also leaves doors open for this lambda function syntax:
let a = (4.7 | round | x -> x + 3)
# these two would be equivalent
let say_hello = x -> f"Hello {x}!"
func say_hello x -> f"Hello {x}!"
I've started a related discussion in #1323 .
let say_hello = x -> f"Hello {x}!"
I'll respond to the fuller discussion later, but there's a reasonable case for having this as the canonical form. Maybe it becomes a bit "functional-y", and clearly demarcating things with func and table is helpful for new users. But it would be both complete and coherent to have this form, given the -> syntax
I recently came across @chris-pikul's go-prql again; he makes an interesting point in https://github.com/chris-pikul/go-prql/blob/main/SYNTAX-NOTES.md#function-as-a-constant, which arguably supports the case for a general let syntax
Circling back on this from Discord. For what I would want as a user, I'd vote for just using let:
Table
(and potentially other variables in the future
let x = (
from employess
filter [country == "USA]
)
Function
let add = a, b -> a + b
...but I hesitate to push for it, since possibly table and func are helpful for being very obvious what's being defined, particularly for new users? (I worry that between us we end up implementing something between lisp and Haskell... :) )
Although a bit less obvious than table add, we should also support:
let x <table> = ( ... )
... which would ease your concerns a bit, hopefully.
And I do think we should leave func syntax in - even though there is basically no semantic difference to let-lambda.
Although a bit less obvious than
table add, we should also support:let x <table> = ( ... )
Yes great point.
And I do think we should leave func syntax in - even though there is basically no semantic difference to let-lambda.
OK interesting. I'm fine with that. We could also support both, and have the func syntax be sugar for the more functional syntax / let add <func> = a b -> a + b. (Though always hesitant about having two forms of anything, so maybe this isn't a great idea)
...but I hesitate to push for it, since possibly table and func are helpful for being very obvious what's being defined, particularly for new users? (I worry that between us we end up implementing something between lisp and Haskell... :) )
Well put @max-sixty . I actually like the let for the pipeline definitions - it looks pretty neat and I didn't like table much to start with because I think we still need to clarify the differences between tables, relations, transforms and pipelines (it seems the number of distinct concepts here is <4 so which ones are the same?). Anyway, that's just me and I also really like the functional programming paradigm, but I worry that it's not a great fit for our target audience. My sense is that we're mostly (or at least in part) aiming at Analytics Analysts which in my understanding have less programming knowledge than SWEs. A lot more people use SQL than probably most programming languages, e.g. many Excel, Power BI and Tableau, etc... users might use some SQL and I think PRQL would be a great fit for them. My sense is that we should take care not to make PRQL too esoteric for them.
We previously mentioned having a "novelty budget". I think we should similarly consider a "changes budget", i.e. not make too many changes unnecessarily. We are still in the pre-1.0 stage so this is the time to make changes but we should use this sparingly.
I actually think we should explore the links to functional programming more as I believe there are also links to monads probably but I would ask that we explore those more before we make this (IMHO) rather large change.
So I quite like the look of
let oldest_employees = (
from employees
sort birth_date
take 3
)
from oldest_employees
But then what about
let oldest_3 = (
sort birth_date
take 3
)
Can you do
from oldest_3
?
Presumably not.
But
from employees
group [city] (oldest_3)
should work while
from employees
group [city] (oldest_employees)
shouldn't.
What's the difference between these to a new user? Why can you use let to define custom relations (pipelines in my lingo) but not custom transforms?
That's why I've been trying to argue to have separate keywords for these and then tighten up our usage of the terms in the book/documentation. For example the page about group says that group takes a "pipeline" as a second parameter. What exactly is a pipeline?
In terms of the previous examples, the following works in the current playground
table oldest_employees = (
from employees
sort birth_date
take 3
)
from oldest_employees
but the second example has to be written as
func oldest_3 rel -> (
rel
sort birth_date
take 3
)
from employees
group [city] (oldest_3)
which works.
My point here is that you might start out with
from employees
group [city] (
sort birth_date
take 3
)
and then think to yourself, "why not factor out group argument to reuse that part with other groupings", and with the let syntax it's not clear why the following wouldn't work:
let oldest_3 = (
sort birth_date
take 3
)
from employees
group [city] (oldest_3)
I agree on all examples of what should work and what should not!
Why can you use let to define custom relations (pipelines in my lingo) but not custom transforms?
Because they are not implemented yet :D
and then think to yourself, "why not factor out group argument to reuse that part with other groupings"
Exactly! Every expression is a value that can be extracted into a let - including things that evaluate to a function!
For example the page about group says that group takes a "pipeline" as a second parameter. What exactly is a pipeline?
Oh, right - with my recent ideas around naming things, that should be changed to "group takes a function (usually expressed as a pipeline) as the second parameter"
@aljazerzen As per our discussions on Discord my understanding is that actually all of my examples above work now with your latest changes which make the let syntax very compelling.
Also this currently would be for relations and transforms while we're still keeping the old func my_func a b -> ... syntax for functions.
Thank you!