prql icon indicating copy to clipboard operation
prql copied to clipboard

filter based on a list of values

Open markxwang opened this issue 3 years ago • 2 comments

Hi team, Is the SQL IN method currently supported in PRQL? I didn't manage to find a single example in the book.

WHERE column_name IN (value1, value2, ...)

markxwang avatar Sep 27 '22 13:09 markxwang

This would be great.

We probably have to add it into the compiler (can't be done in stdlib); either specific to the in function, or to compile a PRQL list into a SQL list in arbitrary places.

I think either would be fine to start, but maybe @aljazerzen or others have examples where e.g. we'd want to treat a list differently depending on the context.

This might be a reasonable 2nd PR, or a 1st PR for someone familiar with rust.

max-sixty avatar Sep 27 '22 19:09 max-sixty

Thanks for this @max-sixty. I'm using the s string as a workaround for now :). I'm not quite familiar with rust to write a PR, but would definitely like to use PRQL more often to provide useful feedbacks. Really appreciate your effort in this amazing project

markxwang avatar Sep 28 '22 09:09 markxwang

From #2170

Currently there's almost a clear split:

  • Symbols, such as +, *, are all infix — i.e. they go between operands, 5 + 8, 4 * 3

    • Some are prefix, such as ! or - when they're unary
  • Functions are all prefix — i.e. they go before operands — i.e. sum foo, in 5..10 bar

    • And then the pipe symbol lets them go after operands — i.e. foo | sum, bar | in 5..10, which is how we normally use them

...except for and & or, which are infix.

I am wondering if this perspective affects the in operator. (i.e., I have a feeling that the in operator cannot be symbolized)

In R, the in operator is %in%, is there a possibility of something like that?

eitsupi avatar Apr 05 '23 14:04 eitsupi

If it's gonna be named in, this should probably be a function as you note. Which means it would have syntax like this:

from a
filter (x | in [1, 2, 3])

If we come up with a symbol for it to be a binary operator, syntax would be:

from a
filter x % [1, 2, 3]

I really wouldn't like to go with the third option of having %in% that you proposed - it would be hard to parse and I don't find it more readable than the pipeline syntax.

aljazerzen avatar Apr 05 '23 16:04 aljazerzen

Thanks, in function x | in [1, 2, 3] looks good!

eitsupi avatar Apr 05 '23 16:04 eitsupi

Oh, I didn't realize that in function already exists for another use. (#1330)

from employees
filter (age | in ..50)

Perhaps the current in function may need to be renamed?

eitsupi avatar Apr 05 '23 16:04 eitsupi

Perhaps the current in function may need to be renamed?

I think ideally we should support both usages, with the same name...

max-sixty avatar Apr 05 '23 16:04 max-sixty

I also agree with supporting both uses of in with the same name and syntax like x | in [1, 2, 3].

snth avatar Apr 05 '23 20:04 snth

Considering #2612, it would be filter (x | in {1, 2, 3})? Or, it seems [1, 2, 3] is an array, so filter (x | in [1, 2, 3])?

eitsupi avatar May 23 '23 16:05 eitsupi

We want an array (all entries are the same type, no names) which would now be filter (x | in [1, 2, 3]).

We could also support filter (x | in {true, 2, 'hello'}), but that's much harder to implement.

aljazerzen avatar May 23 '23 16:05 aljazerzen