EntityFramework-Plus
EntityFramework-Plus copied to clipboard
IncludeOptimized slows down queries
I noticed an interesting performance issue while trying to take advantage of IncludeOptimized. Contrary to my expectation, IncludeOptimized slows down my queries instead of speeding them up.
My use case is a report generator in a real world system that pulls out data for a ~70.000 row .csv file. The code performs 8 Load calls, each with a bunch of Include calls attached to them. According to a profiling, these Load calls take 63 seconds in the current implementation, with hotspots as follows:
- 37 seconds: What appears to be the underlying database queries:
System.Data.Common.DbCommand.ExecuteReader(CommandBehavior) - 23 seconds: What appears to be EF materialization:
System.Data.Entity.Core.Common.Internal.Materialization.Shaper+ObjectQueryNestedEnumerator`1.MoveNext() - 3 seconds: What appears to be EF materialization:
System.Data.Entity.Core.Common.Internal.Materialization.Shaper+SimpleEnumerator`1.MoveNext()
Replacing the Include calls with IncludeOptimized increases the total time to 87 seconds, with these hotspots:
- 14 seconds: The underlying database queries. Now 23 seconds faster:
System.Data.Common.DbCommand.ExecuteReader(CommandBehavior) - 20 seconds: The first materialization path from before, taking approximately the same time:
System.Data.Entity.Core.Common.Internal.Materialization.Shaper+ObjectQueryNestedEnumerator`1.MoveNext() - 25 seconds: The other materialization path from before, now significantly slower:
System.Data.Entity.Core.Common.Internal.Materialization.Shaper+SimpleEnumerator`1.MoveNext() - 27 seconds: A lazy loading related hotspot that was not present in the original test. Note that lazy loading was disabled for both tests!
Z.EntityFramework.Plus.QueryIncludeOptimizedLazyLoading.SetLazyLoaded(Object, List)
In other words: The 23 second improvement in query time entailed an additional 19 seconds of materialization time and 27 seconds of SetLazyLoaded.
Is this a performance issue with Entity Framework Plus, or is my use case just not appropriate for the use of IncludeOptimized?
Hello @Zero3 ,
Thank for reporting.
Yes, sometime IncludeOptimized might be slower but that's strange in your case.
The first thing I would like to check is why SetLazyLoaded happen in IncludeOptimized.
Is it possible for you to send us a test project with this behavior?
Best Regards,
Jonathan
Performance Libraries
context.BulkInsert(list, options => options.BatchSize = 1000);
Entity Framework Extensions • Entity Framework Classic • Bulk Operations • Dapper Plus • LinqToSql Plus
Runtime Evaluation
Eval.Execute("x + y", new {x = 1, y = 2}); // return 3
C# Eval Function • SQL Eval Function
@JonathanMagnan You are welcome :).
Unfortunately, I cannot provide you with a test project, as this code is part of a large commercial closed-source product. But I can assist with debugging, testing etc. Let me know!
@JonathanMagnan In case it is of any use, here are the settings set on the context prior to execution, as well as one of the Load calls. The other calls are pretty identical in structure.
context.Database.CommandTimeout = 60 * 15;
context.Configuration.LazyLoadingEnabled = false;
context.Configuration.AutoDetectChangesEnabled = false;
context.Orders
.Where(order => order.DateCreated < reportStartInstant)
.Where(order => order.DateCreated >= fromDate && order.DateCreated < toDate)
.Select(order => order.Item)
.OfType<ZoneVoucherItem>()
.Include(item => item.DeliveryMethod)
.Include(item => item.ZoneInformation)
.Include(item => item.PassengerType)
.Load();
Hello @Zero3,
Thank for this information but it's possible for you to create a standalone project? I understand you cannot share the commercial source application but these class can be easily copied in another project with most field removed.
Using the provided example, it's possible on your side to create a standalone project that otherwise my developer will exactly have to do.
We now always ask for a sample project since most issues are either missing some essential information, have some syntax errors or are resolved by the requestor when creating it
Having a test project allows us to give faster support and a better experience to users of our free libraries.
Best Regards,
Jonathan
@JonathanMagnan I understand why you are requesting this, and I might be able to spend some time on trying to create a minimum broken example at some point. But this is not possible for me at the moment, so I'm afraid the details above will have to do.
Maybe someone else can provide an example project if they run into this issue as well :).
Hello @Zero3 ,
This issue will be closed since we haven't heard back from you. Feel free to open a new issue if you find time to create the project.
Best Regards,
Jonathan
@JonathanMagnan That is a shame, now that I spent a lot of time tracking down the issue, writing this pretty detailed bug report, and offered to assist with debugging/testing on a real-world system. All because I don't currently have even more time available, for setting up a separate test project for you.
This is definitely not encouraging me to spend any time contributing here in the future...
Hello @Zero3 ,
If you can provide something, I will be happy to reopen it and look at it (I can even reopen it right now).
However, some request like this one requires a project since we cannot anymore afford to do as much on our side.
People often are not aware of the cost to maintain a FREE project. It costs me on average $150 to pay a developer to make and complete such request. That’s why we ask now for a project, to reduce this cost.
Hope you understand.
Best Regards,
Jonathan
@JonathanMagnan I have a huge respect for your development and maintenance of Entity Framework Plus as a free library. And I understand that you might not currently be able to look further into this issue until someone provides a test project. All I am sad about is that you chose to close the issue instead of leaving it open, after all the work I already put into debugging and reporting it.
Hello @Zero3 ,
As said, I don't mind to reopen it, I just cannot pass more time at this issue anymore unless someone provides us with more information. While this project is free on your side, that's not the same on my side, I need to pay developers to look and try to reproduce it.
So I will reopen the issue and hope someone could give us some additional hint.
Best Regards,
Jonathan
Hi @JonathanMagnan & @Zero3,
I'm having a similar problem where IncludeOptimized is unexpectedly slower than Include. Given the history of this issue it seemed I should have a go at reproducing in a simple little sample project, so I've done that and attached. It's a Console application with output like:
Clear DB: 00:00:02.7041862
Generate test data: 00:00:00.0343448
Insert test data: 00:00:53.2880158
Warm up: 00:00:09.0352818
>> Include: 00:00:05.3542852 total, 00:00:00.3041450 DB command execution.
>> IncludeOptimized: 00:00:06.0759931 total, 00:00:00.0003531 DB command execution.
Press any key to continue . . .
Hello @cduff ,
Thank you for reporting.
It sometimes happens for a very long query like yours. As statement is currently executing separated, performing joining multiple time might cause IncludeOptimized to go slower.
The good news is that we talked last week how to improve IncludeOptimized for SQL Server. If time permits next week, we will try to change the strategy to use CTE instead.
We highly believe that query like your will highly benefit from it.
Thanks @JonathanMagnan for your quick response and your organisation's efforts on this project!
However I want to ensure that the correct issue is understood. If you look at the sample output of my Console application you'll see that the DB command execution time of IncludeOptimized is already 861 times faster than the DB command execution time of Include, and in both cases the DB command execution time is a tiny fraction of the overall time. Thus the DB query time is not a problem here and using a CTE wouldn't really help.
Instead it seems the problem lies in the performance of the processing of the DB query results. For example I briefly did some CPU sampling and saw similar results to those originally reported by @Zero3, e.g. a bottleneck could be
Z.EntityFramework.Plus.QueryIncludeOptimizedLazyLoading.SetLazyLoaded
Oh thank,
We will look at it as well. A lot of code might use Expression Tree instead of reflection which would lead to performance improvement after the first compile.
We will look further on this issue using your project.
Thanks @JonathanMagnan. Re:
A lot of code might use Expression Tree instead of reflection which would lead to performance improvement after the first compile.
I think my Console application takes this into account as it executes the queries to 'warm up' before they are executed again for performance measurement.
Hello @cduff ,
We investigated the issue and we find out that almost 90% of the time is taken by Entity Framework during the materialization of the queries.
Commenting the code for Z.EntityFramework.Plus.QueryIncludeOptimizedLazyLoading.SetLazyLoaded make almost no difference.
I don't think anyway can be done at this moment for the materialization since it's taken directly from Entity Framework.
Thanks @JonathanMagnan for looking into this. It's a bit confusing though as IncludeOptimized is slower than Entity Framework even though it has a faster DB command execution time. So if they both have to materialize the same result then IncludeOptimized is losing performance somewhere vs. Entity Framework...
On my side, is still slightly faster ;(
Is it always slower on your side?
Is it always slower on your side?
Yes. I just added some GC.Collect() calls to try to minimise the chance that some garbage collection was affecting results and then ran 5 times with the following average results:
- Include: 5.39s
- IncludeOptimized: 5.66s
Here are some example of my result
Warm up: 00:00:06.9428095
>> Include: 00:00:04.4415727 total, 00:00:00.1106871 DB command execution.
>> IncludeOptimized: 00:00:03.9408031 total, 00:00:00.0037523 DB command execution.
Warm up: 00:00:07.5438154
>> Include: 00:00:04.0474855 total, 00:00:00.1124663 DB command execution.
>> IncludeOptimized: 00:00:03.7944993 total, 00:00:00.0036124 DB command execution.
Hmmm that is interesting and I'm not sure why that'd be the case.
For the closed source commercial software that I'm actually trying to use it for I'm getting measurements like:
- Include: 21.6s (5.1s DB command execution).
- IncludeOptimized: 24.6s (0.008s DB command execution).
So I'm getting behaviour much like originally reported by @Zero3 but perhaps my attempt to reproduce via a simple Console application has been unsuccessful. :(
Yup,
This was a good try, however ;)
The good thing is we will re-visit this feature during the year through the project: https://entityframework-classic.net/
So some part will be optimized and we might find out how to better handle this or have our own optimized entity materialization