prql icon indicating copy to clipboard operation
prql copied to clipboard

feat: Support for sql arrays as s[...]

Open rvalek opened this issue 7 months ago • 11 comments

Hi! I have a use-case requiring SQL-level arrays (usually [...] or ARRAY[...] in SQL): https://github.com/PRQL/prql/discussions/5303.

I see that the current PRQL-level arrays written in square brackets are mainly used to represent the list literals in expressions like x IN (1,2,3), and can not be passed into functions to be interpolated into s-strings. Since PRQL doesn't support variadic functions, I couldn't add an array constructor as a pure s-string either.

This PR introduces new syntax: s[...] that compiles into SQL-level array literals, which can also be passed into s-strings to support my original use-case of constructing maps from arrays.

Example usage:

let x = p1 -> s"x({p1})"

from employees
select {
  name,
  # Edge cases
  empty_array = s[],
  single_element = s[42],
  null_element = s[null],
  complex_expressions = s[age + salary, (age * 2) + 1],
  nested_function_calls = s[(min age), (max salary)],
  passing_as_arg = x s[1,2,3],
  nested = s["a", s["b"]]
}

Compiles to:

SELECT
  name,
  [] AS empty_array,
  [42] AS single_element,
  [NULL] AS null_element,
  [age + salary, age * 2 + 1] AS complex_expressions,
  [MIN(age) OVER (), MAX(salary) OVER ()] AS nested_function_calls,
  x([1, 2, 3]) AS passing_as_arg,
  [ 'a',
  [ 'b' ] ] AS nested
FROM
  employees

-- Generated by PRQL compiler version:0.13.4-85-g4dbc79ee-dirty (https://prql-lang.org)

I understand that adding new syntax (and a new SqlArray type) is a big deal, so I don't expect this to be adopted as-is. I'd be happy to refine this to match the design direction that better suits your vision.

I am also not very experienced with both Rust and PRQL, so any criticism is welcome.

Thanks! Robert

UPD: proposal changed -- see https://github.com/PRQL/prql/pull/5312#issuecomment-2950000887

rvalek avatar May 28 '25 19:05 rvalek

Hi,

Thanks for the PR.

[]s were much more prominent before the switch to {} for the tuple syntax. You're right about the use for literals but maybe that should get the special syntax instead? I think it might be worth reconsidering. We'll have to think about what's more common in practice. I'm going to think about this a bit.

snth avatar May 28 '25 20:05 snth

Yes, we could make [] compile to arrays by default, and only compile to () when used with in. Please let me know your thoughts.

rvalek avatar May 28 '25 22:05 rvalek

interesting suggestion!

I'm trying to think whether there's some way of designing normal array syntax to work, without special-casing in

thanks for your excellent PR summary. do you happen to also have concrete examples of cases where the existing approach doesn't work? i.e. "can not be passed into functions to be interpolated into s-strings....Since PRQL doesn't support variadic functions, I couldn't add an array constructor as a pure s-string either."

max-sixty avatar May 29 '25 17:05 max-sixty

Hi Max! Thank you! Here are simple examples of what I tried before making this PR. All of them fail with unexpected array of values (not supported here):

  1. We can't construct SQL-array literals as-is:
from employees
select {
  literal = [1,2],
  with_cols = [name, age]
}
  1. We can't pass the current [...]into functions:
let y = p1 -> p1

from employees
select {
  passing_as_arg = y [1,2]
}
  1. We can't pass it into s-string interpolation too:
let x = p1 -> s"x({p1})"

from employees
select {
  passing_into_s_string = x [1,2]
}

I also couldn't define something like make_array in the std.prql in the compiler, because:

  1. I can't define a prql function with a varying number of parameters. For example: let make_array = *xs -> internal array.make, which would collect all passed arguments into a vector.
  2. I can't overload prql functions by by the number of arguments (which is totally ok by me), so I couldn't do something ugly like:
let make_array = el1 -> s"[{el1}]"
let make_array = el1 el2 -> s"[{el1}, {el2}]"
let make_array = el1 el2 e3 -> s"[{el1}, {el2}, {el3}]"

Are these the examples you were looking for?

Also, once we have a way of constructing arrays, then map construction can be done simply with an s-string: https://github.com/PRQL/prql/discussions/5303


On the previous proposal to merging my s[] functionality with existing [] and special-casing the IN: it seems to make sense semantically, as from the perspective of the PRQL user, there is no benefit of knowing/worrying that SQL has different syntax for true arrays vs. the membership check with IN. This might introduce some quirks in the compiler implementation, though...

rvalek avatar May 30 '25 15:05 rvalek

Hi guys! I think compiling existing [] arrays to SQL arrays by default and to () only for IN is the best approach. I updated the PR -- removing the new s[] syntax, and implementing the above.

Now this:

let x = p1 -> s"x({p1})"

from t
filter (name | in ['q', 'w'])
select {
  empty_array = [],
  single_element = [42],
  null_element = [null],
  complex_expressions = [age + salary, (age * 2) + 1],
  nested_function_calls = [(min age), (max salary ?? 0)],
  passing_as_arg = x [1,2,3],
  nested = ['a', ['b']]
}

compiles to:

SELECT
  [] AS empty_array,
  [42] AS single_element,
  [NULL] AS null_element,
  [age + salary, age * 2 + 1] AS complex_expressions,
  [MIN(age) OVER (), MAX(COALESCE(salary, 0)) OVER ()] AS nested_function_calls,
  x([1, 2, 3]) AS passing_as_arg,
  [ 'a',
  [ 'b' ] ] AS nested
FROM
  t
WHERE
  name IN ('q', 'w')

rvalek avatar Jun 06 '25 17:06 rvalek

@max-sixty @snth what are your thoughts on the updated version? The main caveat I see is that in std.prql, the array type is used to describe inputs to many functions that don't actually take arrays: let min = column <array> -> internal std.min. But this looks to be a bit of debt from the older time which can also be cleaned up.

Meanwhile, I'm adding this to my org's fork to simplify people's life, but I'd prefer not to diverge from the upstream too much, so if you think this is a good addition to PRQL, I could look into adding tests/updating the docs, etc., to merge this.

Thanks!

rvalek avatar Jun 09 '25 12:06 rvalek

Thanks @rvalek . Your examples look great!

Could you please also check the other use of array literals in expression like:

from [{a=1}, {a=2}]

snth avatar Jun 10 '25 20:06 snth

Thank you, @snth! I forgot to check this case. With my changes, from [{a=1}, {a=2}] compiles to:

WITH table_0 AS (
  SELECT
    1 AS a
  UNION
  ALL
  SELECT
    2 AS a
)
SELECT
  a
FROM
  table_0

So, looks like no change there.

rvalek avatar Jun 11 '25 12:06 rvalek

this is v reasonable, I like the newer version!

could we understand why the tests are failing? that's a requirement for merging

max-sixty avatar Jun 11 '25 18:06 max-sixty

Of course. The tests are failing because one of the negative test-cases is no longer returning an error: https://github.com/rvalek/prql/blob/0390bc2e4c4dc649ee1719b9eabd85f811bc7444/prqlc/prqlc/tests/integration/error_messages.rs#L107 That PRQL is now valid and compiles to:

SELECT
  [first_name, last_name]
FROM
  employees

which, I think, is a good thing. I will update this test to no longer expect an error and add more test for the new array capabilities within a few days.

rvalek avatar Jun 12 '25 18:06 rvalek

@max-sixty @snth we are all green now -- updated a couple tests and added one. Please let me know what else is needed.

rvalek avatar Jun 13 '25 17:06 rvalek

very nice! clever approach in the end, thanks for working through it)

(I'm sorry I didn't hit the button sooner, always feel free to @ me after a couple days...)

max-sixty avatar Jun 25 '25 19:06 max-sixty

Really appreciate your guidance here and all your work on PRQL, @max-sixty & @snth! I'll be happy to contribute further as I keep using PRQL.

Also, would you like me to write a short note for this doc page to showcase the new functionality?

rvalek avatar Jun 26 '25 14:06 rvalek