prql
prql copied to clipboard
an exclude clause, that would select all columns except whats specified
Big Query has a SELECT * EXCEPT , but otherwise I don't think it's a widely supported feature.
But it would be a neat feature when and if it's more supported.
One method might be to select into a temp table then alter that table , would take some exploring to see how wildly supported that is.
+1!
We could use something like https://dba.stackexchange.com/a/1958 for other DBs.
We could represent this as a new transformation:
from employees
drop salary
Or a named arg to select
from employees
select except:[salary] [] # Given current function specs, we'd need to still supply the positional arg
The drop clause feels more natural to me however I think we should call it something like exclude so it doesn't get confused with an actual SQL DROP!
I agree that this would be useful, but the problem here is that we cannot (easily) transpile that to SQL. If we have a query like:
from example
select [a, b, c, d, e, f]
exclude [d, b]
That's easy. No problem.
But
from example
exclude [d, b]
requires knowledge about what columns there are in the database.
As max wrote, there are queries that allow you to SELECT * EXCEPT d b, but they require lookups into information_schema.columns or equivalent. We can do that, but we cannot do that reliably for all databases.
Wait, no, I found something:
CREATE TEMPORARY TABLE TempStudents AS SELECT * FROM Students;
ALTER TABLE TempStudents DROP COLUMN comment;
SELECT * FROM TempStudents;
it says that it works for MySQL, but it may work for others.
As discussed on discord:
I think we could do this, and then just fail if we don't know the column names (unless it's a dialect, like BigQuery, which supports this natively).
It would be a bit unpredictable —
select *andselect {list of all columns}wouldn't be equivalent, but I think it would outweigh the benefits
I would hesitate to create a temporary table, I worry it might have quite a bad performance cliff, but we could assess if that's correct.
One options here would be to do:
from employees
select [-salary, -dob]
...meaning "select the columns apart from salary & dob.
Though possibly the - should be outside the list, given it needs to apply to everything, since select [-salary, dob] wouldn't resolve to anything:
from employees
select -[salary, dob]
...though IIUC that would require expanding our parsing slightly to have unary operators operate on lists.
This would give us the ability to do this for the dialects that support it without introducing any new transforms or syntax, since sort uses the syntax, in its case to do descending order.
I don't like first option, because it clashes with selecting a negated column.
You are right that the second option would be hard to parse. Currently this is already syntactically valid expression for:
BinOp:
- left: Ident select
- op: Minus
- right: List
- Ident salary
- Ident dob
What about just having exclude/except transform?
What about just having
exclude/excepttransform?
Yes, no strong view against it specifically. Keeping the cardinality of keywords low means the language doesn't become cluttered. I think we applied that principle very well in how we do sort — it's both concise and clear.
exclude seems like it would be the least used transform, and it's really a different flavor of select, so squeezing it into select would be ideal, depending on how awkward the squeeze would be...
because it clashes with selecting a negated column.
Yes v good point
Currently this is already syntactically valid expression for:
Yes, maaaybe because we parse the first ident separately, it might be possible to parse it. But currently a pipeline contain expr_calls, not func_calls; so I think it would be inconsistent with allowing arbitrary expressions.
Alternatively it might be possible to have a negative look-ahead for a list in a BinOp. But that's more complication. If that's true, then it becomes a question of whether we like the language feature enough to justify the complication.
https://github.com/prql/prql/blob/04f72b96862eb91d186ffeaf772c3c3b1f537dc1/prql-compiler/src/prql.pest#L47
select [*] except:[foo, bar]
?
select [*] except:[foo, bar]?
Yes +0.5!
Or:
select (* -[foo, bar])
?
select [*] except:[foo, bar]
Not bad, +0.5
select (* -[foo, bar])
Not such a fan of this.
Another idea to throw into the mix:
select [^ foo, bar]
like in regex excluded character classes?
Might be a bit too subtle but has advantage that you could possibly reuse it in group clauses?
For example in this Friendlier SQL with DuckDB blogpost the example:
SELECT
* EXCLUDE (cantinas, booths, scum, villainy),
SUM(scum + villainy) as total_scum_and_villainy
FROM star_wars_locations
GROUP BY ALL
-- GROUP BY systems, planets, cities
could possibly be written as:
from star_wars_locations
group [^ cantinas, booths, scum, villlainy] (
total_scum_and_villainy = sum (scum + villainy)
)
select [^ foo, bar]
One downside with this is ^ seems to apply to the first item, but actually applies to all the items. I think this would be unique (both in PRQL or in general symbolic representation) that it applies to the whole list...
select [^ foo, bar]One downside with this is ^ seems to apply to the first item, but actually applies to all the items.
I agree with and that's the first thing I thought as well. We could of course apply the ^ to each item as in
select [ ^foo, ^bar]
but that seems clumsy and unnecessary. Also, I think it would only make sense to have that flexibility if you foresaw ever having to mix inclusions and exclusions but I can't think of a scenario where that would be the case.
I think this would be unique (both in PRQL or in general symbolic representation) that it applies to the whole list...
In terms of JSON (, Python, ...) type lists probably yes, but that is how it is used in regular expressions. For example [^aeiou] means anything other than a vowel and [^0-9a-zA-Z] is anything other than a digit or lower or upper case character. See e.g. Examples of negated character classes or Basic Regular Expressions: Exclusions.
So the [^...] construct is a whole separate construct to [...]. Thinking of it in terms of the ^ negating the surrounding [], it is quite unusual though since we usually have negating operators negating whatever follows them rather than surrounds them.
Following that line of reasoning, then what about
select ^[foo, bar]
or since we already have ! for !=, what about
select ![foo,bar]
?
(I thought about the options before writing it out and I thought I would like the select ![foo, bar] syntax but seeing it on the screen I like it less than I expected. I think I'm probably still tending towards select [^foo,bar] as my favourite. Somehow having the ^ inside the brackets seems to "contain" it more nicely. Anyway, just some personal and completely unscientific observations from my side.)
What do you think?
I really like
select ![foo,bar]
It's simple and unambiguous if you know that ! means negation.
I'm -1 on ^ for similar reasons as max.
My second option would be select [*] except:[foo, bar]
And to be explicit we can have
select ![foo,bar]
but not select -[foo,bar] because ! is not a binary operator, so it can't be confused with a binary expression between select and [foo,bar]
Closed by #1317 and #1484