prql
prql copied to clipboard
filter foo LIKE "%abc%"?
I frequently rely on the SQL LIKE operator to match rows. Is there an equivalent way to say this in PRQL? Thanks.
It's a good question; I also use LIKE given its simplicity.
It's not that extensible, unlike regex, so one option is to exclusively use regex; https://github.com/prql/prql/issues/562
But regex is not completely trivial, and much of the time we're just asking "is this string in this other string?". Possibly we start with the regex impl and see how that goes. As ever, open to thoughts on this.
It's not that extensible, unlike regex...
Yeah, but ... Everyone learns about LIKE %string% in their first week of SQL class. It's real easy to use, and is remarkably useful.
I'm very comfortable with creating complex regexes if necessary.
But I would argue that LIKE goes a long way toward solving the problem, i.e., your example: "is this string in this other string?". (I think) it has the same semantics as foo == 'abc', so the parser doesn't change much, and we hand off processing to SQL.
It believe it would be useful to include it. Thanks.
@max-sixty So I figured an S-String could help me out. So I created this function:
func like fld str -> s'{fld} like "%{str}%"'
from foo
select bar
filter (like bar "stuff")
It almost worked. But the WHERE clause's string has extra 's in it...
SELECT
bar
FROM
foo
WHERE
bar like "%'stuff'%"
I think the WHERE clause should be bar like "%stuff%" (no ')
I agree this is quite confusing and I had to take a few minutes to work out what was going on.
I think we want:
- func like fld str -> s'{fld} like "%{str}%"'
+ func like fld str -> s"{fld} like '%' || {str} || '%' "
SELECT
bar
FROM
foo
WHERE
bar like '%' || 'stuff' || '%'
The issue here is that "stuff" is a string, and so gets placed in the s-string as a string — i.e. interpolating both the s-string and the string within the s-string — and so gets quotes around it.
I'm not sure if that explanation makes it more clearer or more confusing though...! But the example I think should work, at least...
Ahah! So IIUC, an S-string interpolates a table/column name by dumping that string into the S-string. An S-string interpolates a string by wrapping it in ' ... '. (True? How does this get into the documentation?)
In an one-out-of-one test, func like fld str -> s"{fld} like '%' || {str} || '%' " works great for SQLite. Thanks.
Follow-up thought: This is confusing - especially requiring people to keep track of the difference between a column name and a string.
What bad thing would happen if someone wrote func like fld str -> s'{fld} like "%{str}%"' and PRQL never added ' around the string, and produced bar like "%stuff%"
What bad thing would happen if someone wrote
func like fld str -> s'{fld} like "%{str}%"'and PRQL never added'around the string, and producedbar like "%stuff%"
But then how would someone express a string? e.g. s"{col} = {x}" — sometimes x` would be a column and sometimes a string — we wouldn't want to lose the string there.
I agree it's complicated; I think in this case it's possibly warranted in its complication — I expect "interpolate a string within an s-string" won't be a regular thing. Though as ever I'd be very happy to be wrong! We shall see...
I'll close but feel free to follow-up
Ah, actually this issue is about the language feature, not the s-string question, reopening
Yeah, but ... Everyone learns about LIKE %string% in their first week of SQL class. It's real easy to use, and is remarkably useful.
In my use case I am asking customers to learn PRQL; and having to ask them to learn regex when LIKE is all they need feels unnecessary.
A couple thoughts:
-
I am a huge fan of (and minor contributor to) PRQL. It definitely makes it easier to write SQL. But I think it's bold to ask customers to learn it in its current state, especially if they are going to be using it for business processes. The language is very much in an experimental stage. People keep finding bugs, and the language itself is likely to change for a while.
-
That said, you can get
LIKEfor strings using this function:func like fld str -> s"{fld} like '%' || {str} || '%' " # and use it like this... from foo filter (like col1 "abc") -
@max-sixty implied that something else could be done to implement
LIKEfor columns -
~~Finally, I would support bringing
LIKEinto the language for the reasons you (and I) cited.~~ (I changed my mind. See https://github.com/PRQL/prql/issues/1123#issuecomment-1365224134)
I'm glad you like PRQL!
In my use case I am asking customers to learn PRQL; and having to ask them to learn regex when LIKE is all they need feels unnecessary.
Yes, that's fair. (Though "Replace % with .*" isn't that burdensome, but I agree it's annoying, and we should decrease annoyances...)
LIKE is indeed useful and simple. But it's not extensible. I think doesn't exist in any other language. My internal view is that it doesn't pull its weight as a keyword (though ofc open-minded, and I'm one vote among many.)
How about if we made in work, like "Amazo" | in company_name? That gets us LIKE %foo%, but not LIKE %foo — do you think that's general enough, and then reverting to regexes if people need something more specific?
But a difficulty — unless in defaults to strings (rather than arrays), we'd need to know the type of the column, which we often wouldn't know, and so would require "Amazo" | in company_name<column<string>> or similar, which really isn't simple...
How about if we made
inwork, like"Amazo" | in company_name?That gets usLIKE %foo%, but notLIKE %foo— do you think that's general enough, and then reverting to regexes if people need something more specific?
Personally a partial support for LIKE falls short for no obvious reason from a user's point of view.
What about company_name::string | in "[%]Amazo[%]"? (square brackets mean optional)
It will support wildcard on both sides of the string but if % is omitted, it will behave as LIKE %Amazo%. If you go down the path of specifying column types I posit COLUMN_NAME::TYPE syntax is far more familiar to SQL users than company_name<column<string>>.
I have changed my mind. Seeing that Milestone 5 lists Regex as a goal, I think it decreases the importance of adding LIKE facility since it can be worked around with the like function above.
And the simplicity and consistency of filter company_name =~ "Amazo" further decreases my desire for LIKE.
And my thoughts in #410 mean that I don't feel the urge to burden further versions of the language with this not-very-general feature. Thanks.
What about
company_name::string | in "[%]Amazo[%]"? (square brackets mean optional)
My concern with this is that we're inventing a new regex format. While that example is good, we then need to add semantics for escaping [ & %...
My inclination is to implement #562, and then assess whether that's sufficient. OTOH it's possible that folks who strongly prefer LIKE over regexes also aren't the folks who are opening GH issues, but possibly we can get the context from those such as @armanm ...
Consider using * instead of % as it is more commonly used as a wildcard character.
Closing in favor of #562 as discussed there