prql
prql copied to clipboard
Match statement
What's up?
Inspired by #2930 I thought a match statement could a nice feature complementing case.
Motivation
Mapping a performance friendly data representation to a user friendly one is a common occurrence in data modeling. PRQL improves already greatly over SQL in that regard with case vs CASE. Still, even case can still be needlessly verbose if values in the same column are to be mapped to different values.
Example with case
from order
derive order_status_desc = case [
order_status == 0 => "open",
order_status == 1 => "closed",
order_status == 2 => "on hold",
true => "unknown",
]
Example with match
from order
derive order_status_desc = match order_status [
0 => "open",
1 => "closed",
2 => "on hold",
_ => "unknown"
]
_, borrowed from Rust, signifies all other cases for an exhaustive match. As a catch all it must come last and thus doesn't have to be delimited with a comma. Omitting it produces NULL just as with the case statement.
Example with enum as proposed in #2930
enum OrderStatus {
Open
Closed
OnHold
}
from order
derive order_status_desc = match order_status [
OrderStatus.Open => "open",
OrderStatus.Closed => "closed",
OrderStatus.OnHold => "on hold",
_ => "unknown"
]
While on its own, this doesn't bring any advantage over the previous example (it's actually less terse), together with the module proposal #3474, it would provide a nice way to increase DRYness and get rid of magic numbers that are ever so prevalent in SQL.
Hi @ivenw ! Thanks for the issue. We did think about this in https://github.com/PRQL/prql/issues/1286, and we're still open to it. My current thought is to see how folks' needs evolve and then decide whether we add another language feature — I'm a fan of match, though it does have lots of overlap with case....
For sure, expansion of the API surface needs to be carefully considered.
I would say that match falls more in the category of making queries easier to read rather than to write. With case I, the reader, have to parse the statement line by line to ensure that only one column is referenced. With match this is clear from the get go, This also brings with it the small benefit of making dfffs larger in case that the logic is being changed to reference another column (swapping match for case), making the change easier to spot during PR.
The issue of the enlarged API surface could be met with compiler warnings. When a case statement is detected but the same column is referenced in all arms, the compiler could suggest to change case for match.
Also, small change to my proposal would slightly strengthen the case for match, I believe. Making _, i.e. exhaustive matches, mandatory, so that the statement is explicit about which value is being returned when no match is made.
I like this because I think case was just bringing something from the old world of SQL into PRQL instead of bringing match from the new world of modern languages into PRQL.
It would be cool with range expressions too if that would be possible, such as 0..9 or "a".."z".