prql icon indicating copy to clipboard operation
prql copied to clipboard

filter foo LIKE "%abc%"?

Open richb-hanover opened this issue 3 years ago • 14 comments
trafficstars

I frequently rely on the SQL LIKE operator to match rows. Is there an equivalent way to say this in PRQL? Thanks.

richb-hanover avatar Nov 12 '22 01:11 richb-hanover

It's a good question; I also use LIKE given its simplicity.

It's not that extensible, unlike regex, so one option is to exclusively use regex; https://github.com/prql/prql/issues/562

But regex is not completely trivial, and much of the time we're just asking "is this string in this other string?". Possibly we start with the regex impl and see how that goes. As ever, open to thoughts on this.

max-sixty avatar Nov 12 '22 22:11 max-sixty

It's not that extensible, unlike regex...

Yeah, but ... Everyone learns about LIKE %string% in their first week of SQL class. It's real easy to use, and is remarkably useful.

I'm very comfortable with creating complex regexes if necessary.

But I would argue that LIKE goes a long way toward solving the problem, i.e., your example: "is this string in this other string?". (I think) it has the same semantics as foo == 'abc', so the parser doesn't change much, and we hand off processing to SQL.

It believe it would be useful to include it. Thanks.

richb-hanover avatar Nov 12 '22 23:11 richb-hanover

@max-sixty So I figured an S-String could help me out. So I created this function:

func like fld str -> s'{fld} like "%{str}%"'

from foo  
select bar
filter (like bar "stuff")

It almost worked. But the WHERE clause's string has extra 's in it...

SELECT
  bar
FROM
  foo
WHERE
  bar like "%'stuff'%"

I think the WHERE clause should be bar like "%stuff%" (no ')

richb-hanover avatar Dec 15 '22 20:12 richb-hanover

I agree this is quite confusing and I had to take a few minutes to work out what was going on.

I think we want:

- func like fld str -> s'{fld} like "%{str}%"'
+ func like fld str -> s"{fld} like '%' || {str} || '%' "
SELECT
  bar
FROM
  foo
WHERE
  bar like '%' || 'stuff' || '%'

The issue here is that "stuff" is a string, and so gets placed in the s-string as a string — i.e. interpolating both the s-string and the string within the s-string — and so gets quotes around it.

I'm not sure if that explanation makes it more clearer or more confusing though...! But the example I think should work, at least...

max-sixty avatar Dec 15 '22 23:12 max-sixty

Ahah! So IIUC, an S-string interpolates a table/column name by dumping that string into the S-string. An S-string interpolates a string by wrapping it in ' ... '. (True? How does this get into the documentation?)

In an one-out-of-one test, func like fld str -> s"{fld} like '%' || {str} || '%' " works great for SQLite. Thanks.

richb-hanover avatar Dec 16 '22 04:12 richb-hanover

Follow-up thought: This is confusing - especially requiring people to keep track of the difference between a column name and a string.

What bad thing would happen if someone wrote func like fld str -> s'{fld} like "%{str}%"' and PRQL never added ' around the string, and produced bar like "%stuff%"

richb-hanover avatar Dec 16 '22 05:12 richb-hanover

What bad thing would happen if someone wrote func like fld str -> s'{fld} like "%{str}%"' and PRQL never added ' around the string, and produced bar like "%stuff%"

But then how would someone express a string? e.g. s"{col} = {x}" — sometimes x` would be a column and sometimes a string — we wouldn't want to lose the string there.

I agree it's complicated; I think in this case it's possibly warranted in its complication — I expect "interpolate a string within an s-string" won't be a regular thing. Though as ever I'd be very happy to be wrong! We shall see...

I'll close but feel free to follow-up

max-sixty avatar Dec 17 '22 01:12 max-sixty

Ah, actually this issue is about the language feature, not the s-string question, reopening

max-sixty avatar Dec 17 '22 06:12 max-sixty

Yeah, but ... Everyone learns about LIKE %string% in their first week of SQL class. It's real easy to use, and is remarkably useful.

In my use case I am asking customers to learn PRQL; and having to ask them to learn regex when LIKE is all they need feels unnecessary.

armanm avatar Dec 23 '22 12:12 armanm

A couple thoughts:

  • I am a huge fan of (and minor contributor to) PRQL. It definitely makes it easier to write SQL. But I think it's bold to ask customers to learn it in its current state, especially if they are going to be using it for business processes. The language is very much in an experimental stage. People keep finding bugs, and the language itself is likely to change for a while.

  • That said, you can get LIKE for strings using this function:

    func like fld str -> s"{fld} like '%' || {str} || '%' "
    # and use it like this...
    from foo
    filter (like col1 "abc")
    
  • @max-sixty implied that something else could be done to implement LIKE for columns

  • ~~Finally, I would support bringing LIKE into the language for the reasons you (and I) cited.~~ (I changed my mind. See https://github.com/PRQL/prql/issues/1123#issuecomment-1365224134)

I'm glad you like PRQL!

richb-hanover avatar Dec 23 '22 12:12 richb-hanover

In my use case I am asking customers to learn PRQL; and having to ask them to learn regex when LIKE is all they need feels unnecessary.

Yes, that's fair. (Though "Replace % with .*" isn't that burdensome, but I agree it's annoying, and we should decrease annoyances...)

LIKE is indeed useful and simple. But it's not extensible. I think doesn't exist in any other language. My internal view is that it doesn't pull its weight as a keyword (though ofc open-minded, and I'm one vote among many.)


How about if we made in work, like "Amazo" | in company_name? That gets us LIKE %foo%, but not LIKE %foo — do you think that's general enough, and then reverting to regexes if people need something more specific?

But a difficulty — unless in defaults to strings (rather than arrays), we'd need to know the type of the column, which we often wouldn't know, and so would require "Amazo" | in company_name<column<string>> or similar, which really isn't simple...

max-sixty avatar Dec 25 '22 22:12 max-sixty

How about if we made in work, like "Amazo" | in company_name? That gets us LIKE %foo%, but not LIKE %foo — do you think that's general enough, and then reverting to regexes if people need something more specific?

Personally a partial support for LIKE falls short for no obvious reason from a user's point of view.

What about company_name::string | in "[%]Amazo[%]"? (square brackets mean optional)

It will support wildcard on both sides of the string but if % is omitted, it will behave as LIKE %Amazo%. If you go down the path of specifying column types I posit COLUMN_NAME::TYPE syntax is far more familiar to SQL users than company_name<column<string>>.

armanm avatar Dec 26 '22 07:12 armanm

I have changed my mind. Seeing that Milestone 5 lists Regex as a goal, I think it decreases the importance of adding LIKE facility since it can be worked around with the like function above.

And the simplicity and consistency of filter company_name =~ "Amazo" further decreases my desire for LIKE.

And my thoughts in #410 mean that I don't feel the urge to burden further versions of the language with this not-very-general feature. Thanks.

richb-hanover avatar Dec 26 '22 14:12 richb-hanover

What about company_name::string | in "[%]Amazo[%]"? (square brackets mean optional)

My concern with this is that we're inventing a new regex format. While that example is good, we then need to add semantics for escaping [ & %...

My inclination is to implement #562, and then assess whether that's sufficient. OTOH it's possible that folks who strongly prefer LIKE over regexes also aren't the folks who are opening GH issues, but possibly we can get the context from those such as @armanm ...

max-sixty avatar Dec 26 '22 22:12 max-sixty

Consider using * instead of % as it is more commonly used as a wildcard character.

vanillajonathan avatar Feb 11 '23 23:02 vanillajonathan

Closing in favor of #562 as discussed there

max-sixty avatar Apr 14 '23 01:04 max-sixty