efcore icon indicating copy to clipboard operation
efcore copied to clipboard

Query hints (raw SQL, such as OPTIONs)

Open rowanmiller opened this issue 9 years ago • 110 comments
trafficstars

An example from https://github.com/aspnet/EntityFramework/issues/6649 is being able to append OPTION (MAXRECURSION 2000) to the end of a query.

rowanmiller avatar Oct 07 '16 17:10 rowanmiller

While the idea of integrating arbitrary provider-specific hints in queries seems useful, treating this as some raw SQL to be tacked at the end of the query may be too limited. A specific provider's SQL dialect may require the additional SQL to be integrated somewhere in the middle, etc...

Some arbitrary annotations on the query itself which would be picked up by the provider-specific query SQL generator may cover more cases.

roji avatar Oct 07 '16 20:10 roji

@roji yeah I wasn't intending to limit it to appending text to the end of the query - I removed the word "append" from the title 😄 .

rowanmiller avatar Oct 07 '16 22:10 rowanmiller

Ok great :)

roji avatar Oct 08 '16 10:10 roji

Note from triage: we need to investigate what commonality exists here across relational database systems. For example, where can hints be attached? Is there non-SQL Server value in a query-hint only (i.e. always at the end of the query) solution?

/cc @roji @caleblloyd @ErikEJ

ajcvickers avatar May 17 '18 21:05 ajcvickers

Here's the PostgreSQL documentation page on querying.

I'm not sure how many things it has which can be tacked on without impacting the shape of the results (in which case they'd be beyond the scope of this issue). Some ideas that come to mind:

  • ORDER BY ... NULLS { FIRST | LAST }, impacting the sort order.
  • Locking the selected rows (FOR UPDATE). Users would then be able to do stuff like issue updates in raw SQL, keeping the rows locked until the end of the transaction. It's nice to be able to express the query (i.e. set of rows to lock) with EF Core LINQ.

Seems like ideally providers would be able to define an extension method on the IQueryable (like AsNoTracking()), which would be picked up by the SQL generator.

roji avatar May 18 '18 06:05 roji

Design Notes:

  • Since query/table hints are a lot provider specific, there is no good relational level API (yet). Enums for options will have issue with provider specific code in relational. String version cannot be used as is since it would be append to SQL query. So it is left upto each provider to add individual methods (or 1 common method) for users to specify hints. Details of implementation is upto the provider writers. In turn relational level will provide infrastructure for providers. One easy idea is to use QueryAnnotations (how query tagging works at present), to flow information from method to SelectExpression. It could be generalized (and combined with Tagging implementation), for select expression to have access to query annotations. Provider would give query annotations from their methods, which will reach select expression through relational layer. And while printing out SQL providers could write appropriate SQL based on the annotations. This would allow providers to implement more than just table or query level hints. (e.g. Null ordering)

  • Query level hints are once per query so it can be applied to whole query. Table level hints are slightly more complicated since user may want to apply hint to one table but not the other. The API may need that granularity based on that. Further API needs to consider what would happen for the tables which are indirectly brought in through auto-include of owned entities or nav expansion. Also if API takes the type to fetch table then owned types could cause question which owned type it would apply to. General initial approach would be all or nothing. Using API to say No Lock on tables will apply to all tables in the query, however they are brought into the SQL. Based on customer feedback more granular API can be determined.

Changes to relational level remain pre-requisite for provider level work but the issue remains in backlog for now. Community contributions appreciated.

smitpatel avatar Jun 14 '18 18:06 smitpatel

cc: @ralmsdeveloper

smitpatel avatar Jun 14 '18 18:06 smitpatel

Another use case: specifying per-query collation. PostgreSQL allows the following on an expression:

SELECT a < b COLLATE "de_DE" FROM test1;

or on a clumn:

SELECT a COLLATE "de_DE" < b FROM test1;

roji avatar Jun 14 '18 19:06 roji

@smitpatel agree that the way forward here is by allowing annotations to be specified by the user in the query, and for those annotations to flow to the provider's query SQL generator. The question is indeed granularity and which elements of the query expression tree can be thus annotated.

roji avatar Jun 14 '18 19:06 roji

Thoughts: What @roji spoke makes sense to me, I believe we could create a method for annotation on EFCore.Ralational to meet the following needs, following SQL Server as the basis, but the providers would be responsible for consolidating the end information .

We would have to have one more property in TableExpression (public virtual string Hint {get;set;}).

Remember that when using the annotation, this would be replicated in the "Include" tables: x.Include(y => y.Table)

Or have one more parameter in the extension method: "NoInclude"

public static IQueryable<TEntity> With<TEntity>(
    this IQueryable<TEntity> source,
    string hint,
    string optionalParameter = null);

Sample

var query = _db
    .Blogs 
    .With("NOLOCK", "INDEX(myIndex)") // Second parameter optional
    .ToList();

Output SQL

SELECT [p].[Id], [p].[Date], [p].[Name]
FROM [Blogs] AS [p] WITH (NOLOCK, INDEX(myIndex))  

--or 

SELECT [p].[Id], [p].[Date], [p].[Name]
FROM [Blogs] AS [p] WITH (NOLOCK)  

We could also use Enum to do this:

This would prevent user typing error.

PAGLOCK, NOLOCK, READCOMMITTEDLOCK, ROWLOCK, TABLOCK ou TABLOCKX.

Sample

var query = _db
    .Blogs 
    .With(Hints.NOLOCK)
    .ToList();

For the @roji example: https://github.com/aspnet/EntityFrameworkCore/issues/6717#issuecomment-397403491

I believe we can do this in EF.Functions.

ralmsdeveloper avatar Jun 14 '18 23:06 ralmsdeveloper

Or just:

public static IQueryable<TEntity> Hint<TEntity>(
    this IQueryable<TEntity> source,
    string hint,
    bool repeatForInclude = true);
var query = _db
    .Blogs 
    .Hint("WITH (NOLOCK)")
    .ToList();

This is left under the responsibility of the user.

In my thinking I believe that this also becomes more flexible, the user can theoretically use all available hints, since initially this would be designed only for queries!

ralmsdeveloper avatar Jun 15 '18 01:06 ralmsdeveloper

@ralmsdeveloper - That is what we decided explicitly not to do it. Unless there is any common query hint across all providers.

Especially we cannot use any method with string parameter where string is going to be appended in SQL, that is just asking for SQL injection. And the enum may require different values based on provider. Hence in the current design, there will not be any method in relational provider. Providers have to write methods in whichever way they want. It could be single method with enum values or it could be individual methods for each hint.

Relational layer will just provide the storage facility in terms of query annotations. So provider methods can add query annotations, (whatever shape), and those annotations will be flowed to SQL generator by relational, where provider can look at annotations and determine what to write in SQL. Further it also avoids breaking TableExpression since there is no need to add anything like Hint in any expression. "Hints" will be available globally to SelectExpression while printing, providers can determine where each hint will be printed.

smitpatel avatar Jun 15 '18 05:06 smitpatel

In addition to what @smitpatel wrote above, a simple API to tack strings inside the query is also a pretty bad user experience - the user would need to have knowledge about specific strings and where they go inside the query. It's much better to have provider-defined extension methods which add structured annotations, which are then picked up by the query SQL generator and rendered in SQL. This is also in line with the current way of specifying provider-specific settings on the model, which get translated to SQL by the migrations generator.

@smitpatel if I understand correctly the initial plan is to only allow annotations on the SelectExpression? While it's true this should cover most basic cases, it's a good idea to at least consider what the more flexible/advanced version of this would look like. As I've written above there are viable cases where a column expression, or indeed even a comparison expression could be annotated (see the collation example). I'm not saying these should be supported by day one (in fact I've received no actual request for anything of the sort) but it's good to keep it in mind.

roji avatar Jun 15 '18 07:06 roji

I get it

Especially we cannot use any method with string parameter where string is going to be appended in SQL, that is just asking for SQL injection. And the enum may require different values based on provider. Hence in the current design, there will not be any method in relational provider. Providers have to write methods in whichever way they want. It could be single method with enum values or it could be individual methods for each hint.

Like I said before, it's just thoughts. My view was just in annotation to table, so I talked about creating another property, why providers could override "VisitTable"

However, if we can retrieve anottations from SelectExpression would be enough, so the writers of the providers can simply do what they think best, to meet the more specific needs of each provider.

ralmsdeveloper avatar Jun 15 '18 14:06 ralmsdeveloper

@roji - The current plan is to flow QueryAnnotations to SelectExpression. While it is obvious that query hints (SelectExpression level annotations) are covered by that, you can still use the same infra to have table hints (or other constructs like ordering null) since you have select expression available while printing out table or order by clause. So it is not restricted to just query hints, it could be any part of SQL. I believe it would be better way to transport the data instead of creating properties on individual expressions (like TableExpression/JoinExpression etc.). Perhaps the restriction is on API side, in terms of providers can add API on IQueryable only. Though the methods could take whatever parameters needed to locate the piece of data in SQL.

For the column/comparison example you gave above, how granular it becomes in the query? Do you have any ideas of API. One thing we have in mind for specifying store type for column (since current inference is not great, and perhaps EF6 had similar AsUnicode API), is to have EF.Functions.StoreType kind API, which can be handled as MethodCallTranslator, the same way how providers can currently add custom methods on EF.Functions.

smitpatel avatar Jun 15 '18 17:06 smitpatel

@roji - The current plan is to flow QueryAnnotations to SelectExpression. While it is obvious that query hints (SelectExpression level annotations) are covered by that, you can still use the same infra to have table hints (or other constructs like ordering null) since you have select expression available while printing out table or order by clause. So it is not restricted to just query hints, it could be any part of SQL. I believe it would be better way to transport the data instead of creating properties on individual expressions (like TableExpression/JoinExpression etc.). Perhaps the restriction is on API side, in terms of providers can add API on IQueryable only. Though the methods could take whatever parameters needed to locate the piece of data in SQL.

I understand, but unless I'm mistaken that would make it impossible (or very awkward) to apply different hints to different parts of the query, right? For example, if I want want ORDER BY to sort nulls first, and another to sort nulls last, it would be very difficult to express that, won't it?

For the column/comparison example you gave above, how granular it becomes in the query? Do you have any ideas of API. One thing we have in mind for specifying store type for column (since current inference is not great, and perhaps EF6 had similar AsUnicode API), is to have EF.Functions.StoreType kind API, which can be handled as MethodCallTranslator, the same way how providers can currently add custom methods on EF.Functions.

Just to be extra clear - this seems to be a pretty rare requirement and I'm fairly certain 95% of needs will be met by specifying an annotation "globally" at the query level.

For the column/comparison example you gave above, how granular it becomes in the query? Do you have any ideas of API. One thing we have in mind for specifying store type for column (since current inference is not great, and perhaps EF6 had similar AsUnicode API), is to have EF.Functions.StoreType kind API, which can be handled as MethodCallTranslator, the same way how providers can currently add custom methods on EF.Functions.

EF.Functions.StoreType() does seem like a simple case that can probably be handled with a simple method call translator (although maybe consider a shorter name such as EF.StoreType()?). In the more general case I'm not sure exactly what you're asking with respects to granularity... collation can be specified on each single column, or on a comparison expression involving two columns.

roji avatar Jun 15 '18 17:06 roji

  • It indeed difficult to design an API for such cases. Table hints is also one such case. What if the user wants to apply No Lock on 1 table and not on the other. Due to difficulty in arriving at design of it, we differed to feedback from users. The No Lock at query level could/should/would append No Lock to each table appearing in the query. Order By falls under same umbrella. Especially if user wants to specify nulls first/last differently on different order bys then perhaps a better options would be overload of OrderBy (just throwing out some ideas)`. Of course how to flow that information is still unknown but certainly cannot be query level annotations.

  • As you said, 95% cases should be covered by "global" level. We can look into the API and data flow, once 5% comes up with feature request to add "granular" control.

  • By "granularity", what I mean is, if user needs to pin point the part of query it needs to applied (like particular column or comparison) then query level "hints" (methods) are not suitable for that API. Hence out of the scope of this issue.

If you have ideas for API for things which cannot be specified at "global" level, then please file a new issue with API ideas, we can explore how to flow information to SQL.

smitpatel avatar Jun 15 '18 17:06 smitpatel

Because my English is bad, I prefer to write short texts.

@smitpatel then look at this here: https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql?view=sql-server-2017

Not everyone has this need, but to make a provider much more powerful and flexible, some of these HINTS should be supported at least: https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-table?view=sql-server-2017#syntax

It's worth mentioning that this is a totally isolated SQL Server thing.

Actually now would be to know what kind of API would be global, since the cases are totally different for SQL Server and PostgreSQL.

Then we would have to have two methods:

  • WithNoLock
  • WithLock

I believe that the utility method for the PostgreSQL Provider would be only WithLock, which would translate something like:

SELECT * FROM BAR FOR UPDATE

When I thought about using ENUM I was thinking of supporting all this in one way:

  | FORCESCAN
  | FORCESEEK
  | HOLDLOCK
  | NOLOCK
  | NOWAIT
  | PAGLOCK
  | READCOMMITTED
  | READCOMMITTEDLOCK
  | READPAST
  | READUNCOMMITTED
  | REPEATABLEREAD
  | ROCKLOCK
  | SERIALIZABLE
  | SNAPSHOT
  | TABLOCK
  | TABLOCKX
  | UPDLOCK
  | XLOCK

ralmsdeveloper avatar Jun 15 '18 18:06 ralmsdeveloper

@ralmsdeveloper - Yes, there can be methods like WithNoLock & WithLock but it would be in SqlServer provider. Same way Postgre can add more. Unless we identify hints which are "uniform" across multiple providers like SqlServer, PostgreSQL, SQLite, MySQL, Oracle etc, adding a relational surface may cause more burden on provider writers. (Also there would not be a good default for SQL which we need and use in DefaultQuerySqlGenerator.

smitpatel avatar Jun 15 '18 20:06 smitpatel

@smitpatel & @roji, sorry for my poor communication. But I'm working hard to communicate better with you.

@smitpatel Anyway, what you said was very clear to me now, that would be the way. As a suggestion, there could be a parameter in these methods, as I mentioned above. repeatForInclude for the same annotation if replicate the remaining query tables (INNER JOIN / SUBQUERY)

ralmsdeveloper avatar Jun 15 '18 21:06 ralmsdeveloper

Looks like SQLite has just one hint: INDEXED BY

And NOT INDEXED ...for all those times you wish the query planner would just stop using all those pesky indexes. 😖⁉

bricelam avatar Jun 19 '18 16:06 bricelam

Would this also be supported under this, or is it another issue, or is there a current way to implement it?

WITH CHANGE_TRACKING_CONTEXT (context)

https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/with-change-tracking-context-transact-sql?view=sql-server-2017

bbakermmc avatar Jun 23 '18 06:06 bbakermmc

Just to reiterate the importance of something like EF.StoreType(value, storeType), we've recently seen lots of issues where there's a need to explicitly specify store types when the same CLR type can be mapped to more than one store type (see bottom of issue for some examples).

I just realized that this is a bit more complicated than a simple method translator, since we need to do it for literals/constants as well. In other words, when I specify a literal in my query, EF Core looks up the mapping for that CLR type and uses that to render a literal SQL representation - but we need to be able to specify another type. This choice should affect the mapping that is selected, rather than add a server-side casting operator.

I'm not sure that this problem (both the parameter version and the literal version) is best tracked by this issue, which is more about flowing annotations to the provider's SQL generator. It's also more urgent than general-purpose query SQL hints. Maybe we should split it off and handle separately?

List of scenarios where this is needed:

  • The recent geometry/geography discussion (we may end up mapping NetTopologySuite's geometry types to both database geometry and geography)
  • PostgreSQL has a 6-byte macaddr type and a newer 6- or 8-byte macaddr8 type, both mapped to CLR PhysicalAddress. PhysicalAddress is mapped to the older macaddr by default (backwards compat), and working with the newer one has issues.
  • CLR DateTime is mapped to PostgreSQL timestamp but also to date (since .NET has no date-only type). This again causes some severe issues when trying to work with the non-default mapping (i.e. date). See https://github.com/npgsql/Npgsql.EntityFrameworkCore.PostgreSQL/issues/493#issuecomment-401086741 (fortunately NodaTime can be used to work around this).

roji avatar Jun 29 '18 05:06 roji

@roji Thanks for the input. We're tracking this feature as #4978. I have removed that issue from the backlog so we can discuss priority in triage.

ajcvickers avatar Jun 29 '18 16:06 ajcvickers

Oh sorry, I forgot that #4978 already existed. Thanks for giving this your attention.

roji avatar Jun 29 '18 16:06 roji

Hi, I'm not sure from the discussion where the current thinking on QUERY/TABLE hints has landed and whether a solution is in mind. IMO, the biggest need for many developers is simply control of locking during queries (i.e. table hints). Just that one feature would be really welcome.

For EF 6, we created a With() extension -- just like Rafael described (except with an enum to avoid injection issues). In our implementation, the With() call has to come after an entity collection so we know how to translate the expression and correctly associate the injected WITH statement to the right FROM or JOIN. We also support accepting an index hint, but the feature never gets used (and it's a potential injection issue because the caller passes the index name as a string).

Anyway, the lack of any way to set a table hint for locking in EF Core is our main blocker to migrating our entire commercial ERP app to .Net Core. So I'm super interested in how this issue progresses. Thanks,

-Erik

erikj999 avatar Jul 02 '18 21:07 erikj999

@erikj999 great to hear from you! I have a couple of questions:

  1. What if we gave you the ability to control locking at the query granularity level instead of at the table level? That is, you specify the locking hint once on the query and we take care of propagating it to all the intervening tables. Would that be something your application could use? In other words, how important is it to be able to specify something like NOLOCK only for some of the tables or completely different locking hints on different tables in the same query?
  2. What are the locking hints you use?

divega avatar Jul 03 '18 10:07 divega

Hi @divega, great to hear from you, too. Here's what worries me about propagating a lock hint throughout a query. With READ COMMITTED SNAPSHOT mode, which we use and I think is the default for SQL Azure and given the query:

BEGIN TRANSACTION -- Otherwise nothing gets locked SELECT c.Company, c.CustNum, c.Name FROM Customer AS [c] WITH(UPDLOCK) JOIN Company AS [c1] ON (c.Company = c1.Company) WHERE c1.Country = N'USA'

SQL will hold update intent locks on the Customer table, just as the hint prescribes, but won't hold any locks on the joined Company table. So if a Linq statement includes a lock hint and EF Core propagates that lock hint across all tables in the query, that behavior will be inconsistent with SQL and would cause undesired locking.

IMO, if we only get one table hint to specify in an EF query, then it should only be applied to the FROM table only. It would be better if we could have a table hint attached to any entity in the query, where it would be applied correctly to the FROM or JOIN when the T-SQL is expressed, to make the lock behavior more explicit. Hope that makes sense.

The lock hints we use are:

NOLOCK (does nothing but make us feel good, considering we use Read-Committed Snapshot) HOLDLOCK READPAST UPDLOCK (We use this one a ton) FORCESEEK

One hint we do not use, but I see used in the wild is NOWAIT which is used in logic with retry constructs.

-Erik

erikj999 avatar Jul 04 '18 15:07 erikj999

lol @ size~1-week tag!!

I am happy to build a cross-reference of query hints across database engines and versions. Could take some time. Note sure I understand why QueryAnnotations should be used for this. I think if @smitpatel could demo NOLOCK hint with SQL Server, that would be helpful, but there are just so many additional user stories I would like to see capture:

  • Some way to reference indexes in a ModelBuilder for index hints, so that if I drop an index, my ModelBuilder will complain and it will be caught in unit tests and then developers will see the reference to the index in a query in some repository somewhere.
  • Dove-tailing on this, I always wanted to fluently create covering indexes in a ModelBuilder, and then say on my query, UseCoveringIndex() or something similar to say that the query probably has a covering index (ideally, if EF metadata could not find a covering index, emit a warning).
    • I have proxied the effect of this via erecruit.Expr and Subst.Expr to substitute a UseCoveringIndex<Entity>() with the fields I wanted covered. I could then use reflection in an integration test with the generated database to ensure there exists a covering index for those columns.
  • Some way to associate a plan guide with a query. This has always sucked for DBAs, since plan guide's are tied to specific SQL text hashes. Note, as I mentioned in https://github.com/aspnet/EntityFrameworkCore/issues/12669, AddTags or even an AddComments API would generate multiple cached queries and effectively redundant plans.

jzabroski avatar Aug 06 '18 20:08 jzabroski

It's going to take >1 week just to review this thread :).

I looked for a query hint matrix online thinking someone must have done this before, but couldn't find much, especially for lock control. And after looking across the database engines with providers in EF Core, I came away thinking it's not really possible to put a useful query hint feature into EF Core at the "Relational" level. Even NOLOCK isn't universally supported.

I think table/query hints should be implemented as provider-specific extensions e.g.:

      (from x in db.Things.With(LockHint.UpdLock) select new {x.this, x.that}).MaxDOP(4); 

...where the table hint With() and MaxDOP are extensions method in the SqlServer provider namespace. And use either with a non-SQLServer provider would throw a runtime error. I agree it's good to keep away from hints that require string arguments (like index names) because of the potential for injection attacks. Maybe index hints could be expressed in the model, but that might preclude provider-agnostic models.

BTW, I think the code that creates the T-SQL syntax for the SQL Server provider is in the Relational library -- not the SqlServer provider itself. I'm guessing by having it there, it serves as a default syntax generator for provider projects. But if query hints get promoted to the road map and can't be generalized usefully from a relational POV, that code would be likely forked, right?

erikj999 avatar Aug 06 '18 21:08 erikj999