Improve translation of .Any(s => s.StartsWith()) to use LIKE ANY
Following #422, we can add translation of {Starts,Ends}With() within Any() or All() to PostgreSQL LIKE ANY and LIKE ALL.
Beware LIKE wildcards (% and _) which should not be expanded for {Starts,Ends}With(). If the expression is constant, we can simply escape these client side before rendering the constant expression (see NpgsqlStringStartsWithTranslator for an example). For non-constant expressions we probably can't do anything, although some investigation may provide other ideas.
@roji I'll grab this one too. Do you want to milestone this for 2.1.1 or hold it for 2.2.0?
Assuming this is a small and unobtrusive change - as I imagine it will be if we limit to translating only constant expressions - it can probably be released with 2.1.1. You can start working on this and we'll make the decision based on the change.
It turns out that (as advertised) this is a bit of a pain for anything other than constant expressions.
Setting aside wildcard escaping (%, _), just appending a wildcard is complicated.
I've been looking at this form of translation:
string[] items = new string[] { "a", "b", "c" };
src.Where(x => items.Any(y => x.StartsWith(y)));
WHERE @__x_0 LIKE ANY('{a%,b%,c%}');
In practice, I have queries for which the items array contains dozens to hundreds of values. In EF6, I have some hackish code the creates a painful number of OR LIKE conditions such as:
WHERE @__x_0 LIKE 'a%' OR @__x_0 LIKE 'b%' OR @__x_0 LIKE 'c%'
What I would like to do here is generate something like:
WHERE @__x_0 LIKE ANY(SELECT y || '%' FROM unnest(@__items_0) as y)
But I run into trouble when it comes time to generate the subquery: https://github.com/npgsql/Npgsql.EntityFrameworkCore.PostgreSQL/blob/196b94c0760f495d96cdf01858211f00dd583e66/src/EFCore.PG/Query/Sql/Internal/NpgsqlQuerySqlGenerator.cs#L180-L194
(Here, arrayAnyAllExpression.Array has been set to a SubQueryExpression.)
Which throws:
System.NotImplementedException : VisitSubQuery
at Remotion.Linq.Parsing.ThrowingExpressionVisitor.VisitUnhandledItem[TItem,TResult](TItem unhandledItem, String visitMethod, Func`2 baseBehavior)
at Remotion.Linq.Parsing.ThrowingExpressionVisitor.VisitSubQuery(SubQueryExpression expression)
at Remotion.Linq.Clauses.Expressions.SubQueryExpression.Accept(ExpressionVisitor visitor)
at System.Linq.Expressions.ExpressionVisitor.Visit(Expression node)
at Remotion.Linq.Parsing.ThrowingExpressionVisitor.Visit(Expression expression)
at Npgsql.EntityFrameworkCore.PostgreSQL.Query.Sql.Internal.NpgsqlQuerySqlGenerator.VisitArrayAnyAll(ArrayAnyAllExpression arrayAnyAllExpression)
I have a branch with some exploratory work, but I'm not quite sure how best to approach this.
Overriding VisitSubQuery seems like an obvious option, but I haven't found examples of other providers/EF Core implementing translations like this.
@roji Do you have any thoughts on how to proceed?
AFAIK I don't think other providers do this kind of fancy translation at the moment - you're on the cutting edge.
Just a random (and useless thought)... For your specific example above, it would be possible (and possibly more efficient) to simply do a substring on the each array value and comparing would be faster (i.e. WHERE SUBSTR(@__x_0, 1, 1) IN @__items_0). However, that assumes that all the elements in the array are of the same length.
Much more importantly... before going too far with any translation option here, I'd carefully check performance using raw SQL. It's entirely possible that the ugly solution (lots of ORs) would be faster than the more concise but also more complex option using unnest(). I suspect the latter would involve more internal copying, and the former may be able to use certain indices where the latter probably won't. It's a bad idea for us to provide any translation which actually does something inefficient (but with nicer SQL), it's probably better not to translate at all.
Apart from that, I imagine that overriding VisitSubQuery() would be similar to any other sort of expression tree rewriting (you've already done some similar work)...
I am not sure about EF Core but in EF6 there is a way to translate Enumerable.Contains to sql in expression by returning true from SupportsInExpression. Check this link for more details: Native support for Enumerable.Contains
@Giorgi I'm not 100% knowledgeable about this, but EF Core does generate InExpressions natively - support is already built-in (see InExpression, DefaultQuerySqlGenerator and other places).
However, this issue is about .Any(s => s.StartsWith()), which cannot be translated to SQL IN, but rather to more advanced constructs which don't even necessarily exist in other database which aren't PostgreSQL (i.e. LIKE ANY).
@roji -- Has support for this been sorted out in EF Core yet? Any workarounds?
@atrauzzi I've just checked, and the query above translates in EF 8.0 as part of the general added support for primitive collections (post). However, it isn't the tighter PG-specific LIKE ANY syntax detailed above:
SELECT b."Id", b."Name"
FROM "Blogs" AS b
WHERE EXISTS (
SELECT 1
FROM unnest(@__items_0) AS i(value)
WHERE i.value IS NOT NULL AND left(b."Name", length(i.value)) = i.value)
I'll put this issue into 8.0 in case I get some free time to improve the translation here, though with everything that's going on it's quite unlikely I will.
Love it, thanks!