efcore Query: Support for BETWEEN ... AND ...

Available in T-SQL/SQLite/CosmosSql

Jul 11 '18 22:07 smitpatel

Also in PostgreSQL: https://www.postgresql.org/docs/current/static/functions-comparison.html.

Jul 13 '18 16:07 roji

@smitpatel you can see this, maybe it is the simplest solution to this problem. https://github.com/ralmsdeveloper/EntityFrameworkCore/commit/47c2b4a71066c461861a4f477ba9de5099c17fe5

If you like PR, tests will be implemented in SimpleQueryTestBase, and then provider writers will be able to rewrite it if there are any quirks.

Oct 04 '18 23:10 ralmsdeveloper

I would love to see this, what needs to be done to revive this PR @ralmsdeveloper ?

Jun 18 '20 12:06 tomasbruckner

@tomasbruckner @ralmsdeveloper what do you think this would provide beyond a simple x > y AND y <= z?

Jun 18 '20 15:06 roji

@roji I had started a draft but I ran out of time!

Answering your question: It's just syntax, in the end when the query is compiled the result is the same, everything is converted to GTE and LTE.

The only thing that can change is just the order in which the query can be made. BETWEEN can become a limitation.

DECLARE @id int = 3;
select 'OK' where @id between 2 and 4;
select 'OK' where @id between 4 and 2;
select 'OK' where @id>=2 and @id<=4;
select 'OK' where @id<=4 and @id>=2;

Jun 18 '20 16:06 ralmsdeveloper

I also investigated if there's any potential perf benefit and didn't find any... As such I'd really deprioritize this, compared to other issues we have...

Jun 19 '20 00:06 roji

@roji I agree, based on some analysis that I did, and I came to the conclusion that it is a very big responsibility for EF Core, I believe that EF should not really translate that. As I mentioned above, we can have different results if the user passes parameters out of order!

If there are a lot of people wanting the feature, maybe it would be a good idea to take it to EF.Functions, but it's just thoughts!

Jun 19 '20 00:06 ralmsdeveloper

Coming back to this after a while...

We've recently been focusing a bit on reducing duplication of expressions in our generated SQL, especially around null semantics. For example, translating to "x IS NOT DISTINCT FROM y" instead of x = y OR (x IS NULL AND y IS NULL) would avoid duplicating x and y in the expression (#29624); this is valuable especially when x/y aren't simple columns/parameters/scalars, but rather complex arbitrary expressions which may be expensive to evaluate.

This is also an area where BETWEEN is valuable: instead of evaluating x twice (x >= y AND x <= z), it's evaluated only once (x BETWEEN y AND z).

Database support:

SQL Server
PostgreSQL
SQLite
MySQL
MariaDB
Oracle
Firebird (look @cincuranet I research Firebird too now)

Notes:

Implementation-wise, we should probably do this in SQL post-processing, introducing a new optimization step (SqlOptimizer). We already have some needs for other SQL optimizations, such as constant arithmetic evaluation - this is #28036, though that can probably be done earlier in SqlExpressionFactory, whereas this post-processing optimizer should be reserved for larger-scope optimizations such as the transformation to BETWEEN.
There's a special negation syntax (x NOT BETWEEN y AND z), similar to NOT LIKE, NOT IN, NOT EXISTS... We should also implement that.
Since negation is involved here, we probably want to apply De Morgan simplifications early, before the BETWEEN optimization (and others) are applied. These are currently done in SqlNullabilityProcessor (since there's an interaction with null semantics), which is too late; we may want to apply De Morgan earlier as well, as part of this new optimization step.
The alternative would be to implement this optimization in SqlNullabilityProcessing as well; but we really should start avoiding putting everything in there.

/cc @ranma42

Aug 11 '24 08:08 roji

Coming back to this after a while...

We've recently been focusing a bit on reducing duplication of expressions in our generated SQL, especially around null semantics. For example, translating to "x IS NOT DISTINCT FROM y" instead of x = y OR (x IS NULL AND y IS NULL) would avoid duplicating x and y in the expression (#29624); this is valuable especially when x/y aren't simple columns/parameters/scalars, but rather complex arbitrary expressions which may be expensive to evaluate.

This is also an area where BETWEEN is valuable: instead of evaluating x twice (x >= y AND x <= z), it's evaluated only once (x BETWEEN y AND z).

Note that in most cases the original expression will already contain the duplication. AFAICT currently to make EFCore introduce duplication without having it in the original query you should combine Select() and Where(). An example where this causes a bad translation is:

queriable
  .Select(e => new { X = e.A + e.B * EF.Functions.Random(), Entity = e})
  .Where(e => e.X >= y && e.X <= z)
  .Select(e => e.Entity)

(because of #33791)

Database support:

SQL Server

PostgreSQL

SQLite

MySQL

MariaDB

Oracle

Firebird (look @cincuranet I research Firebird too now)

Notes:

Implementation-wise, we should probably do this in SQL post-processing, introducing a new optimization step (SqlOptimizer). We already have some needs for other SQL optimizations, such as constant arithmetic evaluation - this is #28036, though that can probably be done earlier in SqlExpressionFactory, whereas this post-processing optimizer should be reserved for larger-scope optimizations such as the transformation to BETWEEN.

Doing this in the post-processing is certainly possible and I believe it should be reasonably straightforward. OTOH if we plan on improving the behavior around impure functions (rand, etc) it might not be sufficient to cover all of the cases. Nonetheless, it could definitely start in the postprocessing as a first step and then, if/when needed, we could evaluate whether an additional syntax to guarantee it makes sense (maybe we could avoid it completely if in the meantime we get another form of value binding... yes, CTE, I am looking at you).

There's a special negation syntax (x NOT BETWEEN y AND z), similar to NOT LIKE, NOT IN, NOT EXISTS... We should also implement that.

👍

Since negation is involved here, we probably want to apply De Morgan simplifications early, before the BETWEEN optimization (and others) are applied. These are currently done in SqlNullabilityProcessor (since there's an interaction with null semantics), which is too late; we may want to apply De Morgan earlier as well, as part of this new optimization step.

They are already performed quite early: since #34142 they are part of the Not construction: https://github.com/dotnet/efcore/blob/ca891c29e26a1a9f7473ce27299397c2830e7e56/src/EFCore.Relational/Query/SqlExpressionFactory.cs#L646-L652

Overall, I think that doing this as an optimization makes sense and should be straightforward; it could also help with some cases in which side-effects are not respected, but that is a bigger issue, that will also require bigger changes. If the optimization is what we are aiming for (at least for the moment), I think I would be able to tackle it 😎

Aug 11 '24 09:08 ranma42

They are already performed quite early: since https://github.com/dotnet/efcore/pull/34142 they are part of the Not construction

Oh right 🤦 I forgot you did this change... Even better.

There's the question of whether we should be doing this even when the source LINQ query contains duplication, e.g. Where(b => b.Posts.First().Rating >= 2 && b.Posts.First().Rating <= 5). If the expression is impure, doing so could cause unexpected results the other way (though this seems like quite a contrived edge case). Putting impurity aside, there's also the question of performance - performing deep equality here (on b.Posts.First().Rating)) isn't free - though #34149 would make it generally very cheap. In addition, we'd only do this when we already have >= and <= in the same predicate, so that seems fine.

So overall this does seem fine to me, at least until we have a better story for impurity (which we currently don't handle properly in other scenarios as well). Once we know about impure expressions, we can avoid the transformation to BETWEEN the moment there's any impurity detected.

Aug 11 '24 10:08 roji