Vogen icon indicating copy to clipboard operation
Vogen copied to clipboard

Analyzer to spot comparison with primitives in EFCore Linq selectors

Open SteveDunn opened this issue 1 year ago • 6 comments

In a Linq query across a collection I can do this without a problem

var x = list.Where(e => e.Id == 1).Single();

However, the same query in EFCore will fail with an InvalidCastException and a message of 'Invalid cast from 'System.Int32' to 'Domain.Id'

To fix this, an explicit cast is required

var x = context.Entities.Where(e => e.Id == (Id)id).Single();  

or

var x = context.Entities.Where(e => e.Id == Id.From(id)).Single();  

These are difficult to spot at compile time as the Id and int types are equatable.

One way to prevent the runtime errors is to generate the implicit cast, but this allows bypassing the From method and introduces the potential to create invalid instances. Originally posted by @rbanks54 in https://github.com/SteveDunn/Vogen/issues/502#issuecomment-2421061334

SteveDunn avatar Oct 20 '24 12:10 SteveDunn

Just some thoughts...

If the analyzer is applied to all linq queries regardless of their usage, it could break other people's code where they have non-EF queries. Probably not the best outcome 🙂

If the analyzer can detect an EF query by checking if the class being queried inherits from DBContext that would be wonderful. This is this is likely harder that the simple scenario I added as the comparison could be defined in a helper method or specification object that generates a where clause. I don't know how easy it is chasing that back up the call tree to see where it might be used.

The analyzer would also need to check the other form of EF querying as the syntax tree is different; i.e.

var x = (from e in context.Entities
  where e.Id == id
  select e).Single();

I've not done much with analyzers, so I don't know how tricky any of this might be. It sounds like a fun problem to try and solve, though!

rbanks54 avatar Oct 20 '24 23:10 rbanks54

Just some thoughts...

If the analyzer is applied to all linq queries regardless of their usage, it could break other people's code where they have non-EF queries. Probably not the best outcome 🙂

If the analyzer can detect an EF query by checking if the class being queried inherits from DBContext that would be wonderful. This is this is likely harder that the simple scenario I added as the comparison could be defined in a helper method or specification object that generates a where clause. I don't know how easy it is chasing that back up the call tree to see where it might be used.

The analyzer would also need to check the other form of EF querying as the syntax tree is different; i.e.

var x = (from e in context.Entities
  where e.Id == id
  select e).Single();

I've not done much with analyzers, so I don't know how tricky any of this might be. It sounds like a fun problem to try and solve, though!

Hi @rbanks54 - I've had a first attempt at this. I'll release an alpha version later today if you wouldn't mind playing about with it? It looks for invocation expressions named Where, Select TakeWhile, SkipWhile etc. It then checks to see if they're being called on anything that derives from DbSet. It then looks at each sub-expression to see if the left value is a value object and the right value is a primitive (int).

It seems to work quite well. Good point re. the other syntax for querying. I just tried that and it doesn't work! I'll see if I can get it working.

Many thanks.

SteveDunn avatar Oct 21 '24 10:10 SteveDunn

It's looking like a good start :)

image

Both line 127 and 130 in the screenshot are detected as problems.

The last approach (lines 132 to 135), where I split up a query, doesn't get detected as a problem though :(

Because I split the line up, the type for step1 is DbSet<EmployeeEntity>? rather than the non-nullable type. You might need to check for nullable DbSets in the analyzer if you haven't already done so.

When we get to the where clause, the type is just an IQueryable<EmployeeEntity>?. The DbSet typing has been lost due to the fluent chaining.

[Edited: I had a corrected screenshot in the original message]

rbanks54 avatar Oct 21 '24 21:10 rbanks54

When we get to the where clause, the type is just an IQueryable<EmployeeEntity>?. The DbSet typing has been lost due to the fluent chaining.

I would be very cautious about extending the analyzer to IQueryable<> since that would necessarily include other db engines and even linq to objects .AsQueryable() into the detection, which could lead to false positives.

viceroypenguin avatar Oct 21 '24 21:10 viceroypenguin

When we get to the where clause, the type is just an IQueryable?. The DbSet typing has been lost due to the fluent chaining.

I would be very cautious about extending the analyzer to IQueryable<> since that would necessarily include other db engines and even linq to objects .AsQueryable() into the detection, which could lead to false positives.

Thanks for the steer @viceroypenguin . It first checks for IQueryable and then checks to see if it's specifically a DBSet in the efcore namespace, so I think we should be good.

SteveDunn avatar Oct 21 '24 21:10 SteveDunn

It's looking like a good start :)

image

Both line 127 and 130 in the screenshot are detected as problems.

The last approach (lines 132 to 135), where I split up a query, doesn't get detected as a problem though :(

Because I split the line up, the type for step1 is DbSet<EmployeeEntity>? rather than the non-nullable type. You might need to check for nullable DbSets in the analyzer if you haven't already done so.

When we get to the where clause, the type is just an IQueryable<EmployeeEntity>?. The DbSet typing has been lost due to the fluent chaining.

Thanks for the quick feedback! I'll take a look at those (very cunning) scenarios!

SteveDunn avatar Oct 21 '24 21:10 SteveDunn

I can get it to work for:

var step1 = context.Entities;
var step2 = step1.Where(e => e.Id == id);

But, I can't get it to work for the extra level of indirection in:

var step1 = context.Entities;
var step2 = step1.Take(4);
var step3 = step2.Where(e => e.Id == id);

step2 changes from DbSet<> to IQueryable<>.

SteveDunn avatar Oct 27 '24 21:10 SteveDunn

Would a recursive check up the call tree be viable?

i.e. if the analyzer finds the source is an IQueryable, then use SymbolFinder.FindDeclarationsAsync get the IQueryable's definition and and see if it, in turn, is using a DbSet or IQueryable as it's source?

rbanks54 avatar Oct 28 '24 00:10 rbanks54

Apparently, SymbolFinder isn't available to analysers. There's 'data flow analysis', but there's not much documentation on it, and the stuff that there is, I'm too stupid to understand! :)

SteveDunn avatar Oct 29 '24 05:10 SteveDunn

I spent a fair amount of time trying to get the analyzer to spot the violation on the last line.

 using var ctx = new DbContext();
 
 DbSet<EmployeeEntity> step1 = ctx.Entities;
 var step2 = step1.Take(4);
 var entities = step2.Where(e => {|#0:e.Age == 50|});

It proved difficult (for me, anyway!): the SymbolFinder isn't available, so I went with the approach that similar method-level analyzers take, which follows these steps:

  • RegisterCompilationStartAction - and do 'expensive' stuff there, like getting types by metadata and creating a lookup table
  • RegisterOperationBlockStartAction - where the 'block' matches a method
  • RegisterOperationAction - where it gets notified of ExpressionsStatement, PropertyReference, and Invocation` operations.

For each operation, depending on what it is, it looks up the associated 'sides' of the expression (if it's a property access, is the property a DbSet?, if it's an expression, is it a binary expression and is one of the sides a DbSet or IQueryable, and if it's IQueryable, does it originate from something else that relates back to a DbSet?)

There was a couple of issues with this approach:

  1. it was slow, or at least looked slow, as it was doing a lot of work
  2. I don't think it would've covered other issues, like a DbSet being passed to the method, but where the parameter is IQueryable<> - I don't know of any way that an analyzer could distinguish the difference, but I could be wrong.

I'll close this one for now as I think it's better to have basic functionality than none at all. I'll revisit this one if I learn something that might help implement it more efficiently in the future.

SteveDunn avatar Oct 30 '24 21:10 SteveDunn