prql icon indicating copy to clipboard operation
prql copied to clipboard

Match statement

Open ivenw opened this issue 2 years ago • 3 comments

What's up?

Inspired by #2930 I thought a match statement could a nice feature complementing case.

Motivation

Mapping a performance friendly data representation to a user friendly one is a common occurrence in data modeling. PRQL improves already greatly over SQL in that regard with case vs CASE. Still, even case can still be needlessly verbose if values in the same column are to be mapped to different values.

Example with case

from order
derive order_status_desc = case [
    order_status == 0 => "open",
    order_status == 1 => "closed",
    order_status == 2 => "on hold",
    true => "unknown",
]

Example with match

from order
derive order_status_desc = match order_status [
    0 => "open",
    1 => "closed",
    2 => "on hold",
    _ => "unknown"
]

_, borrowed from Rust, signifies all other cases for an exhaustive match. As a catch all it must come last and thus doesn't have to be delimited with a comma. Omitting it produces NULL just as with the case statement.

Example with enum as proposed in #2930

enum OrderStatus {
    Open
    Closed
    OnHold
}

from order
derive order_status_desc = match order_status [
    OrderStatus.Open => "open",
    OrderStatus.Closed => "closed",
    OrderStatus.OnHold => "on hold",
    _ => "unknown"
]

While on its own, this doesn't bring any advantage over the previous example (it's actually less terse), together with the module proposal #3474, it would provide a nice way to increase DRYness and get rid of magic numbers that are ever so prevalent in SQL.

ivenw avatar Oct 26 '23 20:10 ivenw

Hi @ivenw ! Thanks for the issue. We did think about this in https://github.com/PRQL/prql/issues/1286, and we're still open to it. My current thought is to see how folks' needs evolve and then decide whether we add another language feature — I'm a fan of match, though it does have lots of overlap with case....

max-sixty avatar Oct 26 '23 23:10 max-sixty

For sure, expansion of the API surface needs to be carefully considered.

I would say that match falls more in the category of making queries easier to read rather than to write. With case I, the reader, have to parse the statement line by line to ensure that only one column is referenced. With match this is clear from the get go, This also brings with it the small benefit of making dfffs larger in case that the logic is being changed to reference another column (swapping match for case), making the change easier to spot during PR.

The issue of the enlarged API surface could be met with compiler warnings. When a case statement is detected but the same column is referenced in all arms, the compiler could suggest to change case for match.

Also, small change to my proposal would slightly strengthen the case for match, I believe. Making _, i.e. exhaustive matches, mandatory, so that the statement is explicit about which value is being returned when no match is made.

ivenw avatar Oct 27 '23 07:10 ivenw

I like this because I think case was just bringing something from the old world of SQL into PRQL instead of bringing match from the new world of modern languages into PRQL.

It would be cool with range expressions too if that would be possible, such as 0..9 or "a".."z".

vanillajonathan avatar Nov 22 '23 15:11 vanillajonathan