prql icon indicating copy to clipboard operation
prql copied to clipboard

an exclude clause, that would select all columns except whats specified

Open qharlie opened this issue 3 years ago • 16 comments

Big Query has a SELECT * EXCEPT , but otherwise I don't think it's a widely supported feature.

But it would be a neat feature when and if it's more supported.

One method might be to select into a temp table then alter that table , would take some exploring to see how wildly supported that is.

qharlie avatar Mar 11 '22 23:03 qharlie

+1!

We could use something like https://dba.stackexchange.com/a/1958 for other DBs.

We could represent this as a new transformation:

from employees
drop salary

Or a named arg to select

from employees
select except:[salary] []   # Given current function specs, we'd need to still supply the positional arg

max-sixty avatar Mar 12 '22 01:03 max-sixty

The drop clause feels more natural to me however I think we should call it something like exclude so it doesn't get confused with an actual SQL DROP!

danjones1618 avatar Mar 12 '22 13:03 danjones1618

I agree that this would be useful, but the problem here is that we cannot (easily) transpile that to SQL. If we have a query like:

from example
select [a, b, c, d, e, f]
exclude [d, b]

That's easy. No problem.

But

from example
exclude [d, b]

requires knowledge about what columns there are in the database.

As max wrote, there are queries that allow you to SELECT * EXCEPT d b, but they require lookups into information_schema.columns or equivalent. We can do that, but we cannot do that reliably for all databases.

aljazerzen avatar Apr 01 '22 07:04 aljazerzen

Wait, no, I found something:

CREATE TEMPORARY TABLE TempStudents AS SELECT * FROM Students;
ALTER TABLE TempStudents DROP COLUMN comment;
SELECT * FROM TempStudents; 

it says that it works for MySQL, but it may work for others.

aljazerzen avatar Apr 01 '22 07:04 aljazerzen

As discussed on discord:

I think we could do this, and then just fail if we don't know the column names (unless it's a dialect, like BigQuery, which supports this natively).

It would be a bit unpredictable — select * and select {list of all columns} wouldn't be equivalent, but I think it would outweigh the benefits


I would hesitate to create a temporary table, I worry it might have quite a bad performance cliff, but we could assess if that's correct.

max-sixty avatar Sep 11 '22 21:09 max-sixty

One options here would be to do:

from employees
select [-salary, -dob]

...meaning "select the columns apart from salary & dob.

Though possibly the - should be outside the list, given it needs to apply to everything, since select [-salary, dob] wouldn't resolve to anything:

from employees
select -[salary, dob]

...though IIUC that would require expanding our parsing slightly to have unary operators operate on lists.

This would give us the ability to do this for the dialects that support it without introducing any new transforms or syntax, since sort uses the syntax, in its case to do descending order.

max-sixty avatar Dec 15 '22 19:12 max-sixty

I don't like first option, because it clashes with selecting a negated column.

You are right that the second option would be hard to parse. Currently this is already syntactically valid expression for:

BinOp:
- left: Ident select
- op: Minus
- right: List
  - Ident salary
  - Ident dob

What about just having exclude/except transform?

aljazerzen avatar Dec 15 '22 20:12 aljazerzen

What about just having exclude/except transform?

Yes, no strong view against it specifically. Keeping the cardinality of keywords low means the language doesn't become cluttered. I think we applied that principle very well in how we do sort — it's both concise and clear.

exclude seems like it would be the least used transform, and it's really a different flavor of select, so squeezing it into select would be ideal, depending on how awkward the squeeze would be...

because it clashes with selecting a negated column.

Yes v good point

Currently this is already syntactically valid expression for:

Yes, maaaybe because we parse the first ident separately, it might be possible to parse it. But currently a pipeline contain expr_calls, not func_calls; so I think it would be inconsistent with allowing arbitrary expressions.

Alternatively it might be possible to have a negative look-ahead for a list in a BinOp. But that's more complication. If that's true, then it becomes a question of whether we like the language feature enough to justify the complication.

https://github.com/prql/prql/blob/04f72b96862eb91d186ffeaf772c3c3b1f537dc1/prql-compiler/src/prql.pest#L47

max-sixty avatar Dec 15 '22 23:12 max-sixty

select [*] except:[foo, bar]

?

aljazerzen avatar Dec 16 '22 19:12 aljazerzen

select [*] except:[foo, bar]

?

Yes +0.5!

max-sixty avatar Dec 16 '22 23:12 max-sixty

Or:

select (* -[foo, bar])

?

max-sixty avatar Dec 17 '22 22:12 max-sixty

select [*] except:[foo, bar]

Not bad, +0.5

select (* -[foo, bar])

Not such a fan of this.


Another idea to throw into the mix:

select [^ foo, bar]

like in regex excluded character classes?

Might be a bit too subtle but has advantage that you could possibly reuse it in group clauses?

For example in this Friendlier SQL with DuckDB blogpost the example:

SELECT
    * EXCLUDE (cantinas, booths, scum, villainy),
    SUM(scum + villainy) as total_scum_and_villainy
FROM star_wars_locations
GROUP BY ALL
-- GROUP BY systems, planets, cities

could possibly be written as:

from star_wars_locations
group [^ cantinas, booths, scum, villlainy] (
    total_scum_and_villainy = sum (scum + villainy)
)

snth avatar Dec 20 '22 16:12 snth

select [^ foo, bar]

One downside with this is ^ seems to apply to the first item, but actually applies to all the items. I think this would be unique (both in PRQL or in general symbolic representation) that it applies to the whole list...

max-sixty avatar Dec 20 '22 19:12 max-sixty

select [^ foo, bar]

One downside with this is ^ seems to apply to the first item, but actually applies to all the items.

I agree with and that's the first thing I thought as well. We could of course apply the ^ to each item as in

select [ ^foo, ^bar]

but that seems clumsy and unnecessary. Also, I think it would only make sense to have that flexibility if you foresaw ever having to mix inclusions and exclusions but I can't think of a scenario where that would be the case.


I think this would be unique (both in PRQL or in general symbolic representation) that it applies to the whole list...

In terms of JSON (, Python, ...) type lists probably yes, but that is how it is used in regular expressions. For example [^aeiou] means anything other than a vowel and [^0-9a-zA-Z] is anything other than a digit or lower or upper case character. See e.g. Examples of negated character classes or Basic Regular Expressions: Exclusions.

So the [^...] construct is a whole separate construct to [...]. Thinking of it in terms of the ^ negating the surrounding [], it is quite unusual though since we usually have negating operators negating whatever follows them rather than surrounds them.

Following that line of reasoning, then what about

select ^[foo, bar]

or since we already have ! for !=, what about

select ![foo,bar]

?

(I thought about the options before writing it out and I thought I would like the select ![foo, bar] syntax but seeing it on the screen I like it less than I expected. I think I'm probably still tending towards select [^foo,bar] as my favourite. Somehow having the ^ inside the brackets seems to "contain" it more nicely. Anyway, just some personal and completely unscientific observations from my side.)

What do you think?

snth avatar Dec 20 '22 21:12 snth

I really like

select ![foo,bar]

It's simple and unambiguous if you know that ! means negation.

I'm -1 on ^ for similar reasons as max.

My second option would be select [*] except:[foo, bar]

aljazerzen avatar Dec 21 '22 11:12 aljazerzen

And to be explicit we can have

select ![foo,bar]

but not select -[foo,bar] because ! is not a binary operator, so it can't be confused with a binary expression between select and [foo,bar]

max-sixty avatar Dec 21 '22 18:12 max-sixty

Closed by #1317 and #1484

aljazerzen avatar Jan 12 '23 14:01 aljazerzen