efcore.pg icon indicating copy to clipboard operation
efcore.pg copied to clipboard

Improve translation of .Any(s => s.StartsWith()) to use LIKE ANY

Open roji opened this issue 7 years ago • 9 comments

Following #422, we can add translation of {Starts,Ends}With() within Any() or All() to PostgreSQL LIKE ANY and LIKE ALL.

Beware LIKE wildcards (% and _) which should not be expanded for {Starts,Ends}With(). If the expression is constant, we can simply escape these client side before rendering the constant expression (see NpgsqlStringStartsWithTranslator for an example). For non-constant expressions we probably can't do anything, although some investigation may provide other ideas.

roji avatar May 29 '18 20:05 roji

@roji I'll grab this one too. Do you want to milestone this for 2.1.1 or hold it for 2.2.0?

austindrenski avatar May 29 '18 21:05 austindrenski

Assuming this is a small and unobtrusive change - as I imagine it will be if we limit to translating only constant expressions - it can probably be released with 2.1.1. You can start working on this and we'll make the decision based on the change.

roji avatar May 30 '18 08:05 roji

It turns out that (as advertised) this is a bit of a pain for anything other than constant expressions.

Setting aside wildcard escaping (%, _), just appending a wildcard is complicated.

I've been looking at this form of translation:

string[] items = new string[] { "a", "b", "c" };
src.Where(x => items.Any(y => x.StartsWith(y)));
WHERE @__x_0 LIKE ANY('{a%,b%,c%}');

In practice, I have queries for which the items array contains dozens to hundreds of values. In EF6, I have some hackish code the creates a painful number of OR LIKE conditions such as:

WHERE @__x_0 LIKE 'a%' OR @__x_0 LIKE 'b%' OR @__x_0 LIKE 'c%' 

What I would like to do here is generate something like:

WHERE @__x_0 LIKE ANY(SELECT y || '%' FROM unnest(@__items_0) as y)

But I run into trouble when it comes time to generate the subquery: https://github.com/npgsql/Npgsql.EntityFrameworkCore.PostgreSQL/blob/196b94c0760f495d96cdf01858211f00dd583e66/src/EFCore.PG/Query/Sql/Internal/NpgsqlQuerySqlGenerator.cs#L180-L194

(Here, arrayAnyAllExpression.Array has been set to a SubQueryExpression.)

Which throws:

System.NotImplementedException : VisitSubQuery
   at Remotion.Linq.Parsing.ThrowingExpressionVisitor.VisitUnhandledItem[TItem,TResult](TItem unhandledItem, String visitMethod, Func`2 baseBehavior)
   at Remotion.Linq.Parsing.ThrowingExpressionVisitor.VisitSubQuery(SubQueryExpression expression)
   at Remotion.Linq.Clauses.Expressions.SubQueryExpression.Accept(ExpressionVisitor visitor)
   at System.Linq.Expressions.ExpressionVisitor.Visit(Expression node)
   at Remotion.Linq.Parsing.ThrowingExpressionVisitor.Visit(Expression expression)
   at Npgsql.EntityFrameworkCore.PostgreSQL.Query.Sql.Internal.NpgsqlQuerySqlGenerator.VisitArrayAnyAll(ArrayAnyAllExpression arrayAnyAllExpression)

I have a branch with some exploratory work, but I'm not quite sure how best to approach this.

Overriding VisitSubQuery seems like an obvious option, but I haven't found examples of other providers/EF Core implementing translations like this.

@roji Do you have any thoughts on how to proceed?

austindrenski avatar Jul 11 '18 16:07 austindrenski

AFAIK I don't think other providers do this kind of fancy translation at the moment - you're on the cutting edge.

Just a random (and useless thought)... For your specific example above, it would be possible (and possibly more efficient) to simply do a substring on the each array value and comparing would be faster (i.e. WHERE SUBSTR(@__x_0, 1, 1) IN @__items_0). However, that assumes that all the elements in the array are of the same length.

Much more importantly... before going too far with any translation option here, I'd carefully check performance using raw SQL. It's entirely possible that the ugly solution (lots of ORs) would be faster than the more concise but also more complex option using unnest(). I suspect the latter would involve more internal copying, and the former may be able to use certain indices where the latter probably won't. It's a bad idea for us to provide any translation which actually does something inefficient (but with nicer SQL), it's probably better not to translate at all.

Apart from that, I imagine that overriding VisitSubQuery() would be similar to any other sort of expression tree rewriting (you've already done some similar work)...

roji avatar Jul 14 '18 17:07 roji

I am not sure about EF Core but in EF6 there is a way to translate Enumerable.Contains to sql in expression by returning true from SupportsInExpression. Check this link for more details: Native support for Enumerable.Contains

Giorgi avatar Dec 01 '18 20:12 Giorgi

@Giorgi I'm not 100% knowledgeable about this, but EF Core does generate InExpressions natively - support is already built-in (see InExpression, DefaultQuerySqlGenerator and other places).

However, this issue is about .Any(s => s.StartsWith()), which cannot be translated to SQL IN, but rather to more advanced constructs which don't even necessarily exist in other database which aren't PostgreSQL (i.e. LIKE ANY).

roji avatar Dec 02 '18 06:12 roji

@roji -- Has support for this been sorted out in EF Core yet? Any workarounds?

atrauzzi avatar Oct 10 '23 20:10 atrauzzi

@atrauzzi I've just checked, and the query above translates in EF 8.0 as part of the general added support for primitive collections (post). However, it isn't the tighter PG-specific LIKE ANY syntax detailed above:

SELECT b."Id", b."Name"
FROM "Blogs" AS b
WHERE EXISTS (
    SELECT 1
    FROM unnest(@__items_0) AS i(value)
    WHERE i.value IS NOT NULL AND left(b."Name", length(i.value)) = i.value)

I'll put this issue into 8.0 in case I get some free time to improve the translation here, though with everything that's going on it's quite unlikely I will.

roji avatar Oct 11 '23 11:10 roji

Love it, thanks!

atrauzzi avatar Oct 11 '23 15:10 atrauzzi