efcore icon indicating copy to clipboard operation
efcore copied to clipboard

Strange memory behaviour with large number of columns

Open Steve887 opened this issue 1 year ago • 9 comments

I have run into a strange issue with our app that results in strange memory usage: spiky memory loads and increasing memory usage over time, when there is a large number of properties in the model being selected. What's particularly strange, if I comment out just one property, the memory issues no longer occur and memory usage is extremely consistent.

A requirement of our system is to change connection strings at runtime, so instead of using Services.AddDbContext, I am registering the data context with Autofac and passing in a connection string at runtime, then using an overridden OnConfiguring to setup the SqlServer provider. If I change this to use Services.AddDbContext then memory does return the normal, but I'm confused why this would only seem to have an effect at a large number of properties.

The rest of my setup is very normal, with a new entities being selected with Includes.

I have attached an app that reproduces the issue, the steps are as follows:

  1. Open the memory test solution
  2. Start the MemoryTest.Api project. This will create a database on a (localdb)\MSSQLLocalDB database, so change this for SqlExpress or other server etc in Program.cs.
  3. Start memory profiling with your favourite program (I used dotMemory)
  4. Run the StressTestApi.ps1 file. This will execute an API call against the endpoint constantly to simulate a load.

Running the application as is, results in a memory graph as follows: image

Open Item.cs and comment out the ItemImage property. Then open ImageMap.cs and comment out the ItemImage property mapping. Start the application again, attach the memory profiler and rerun the powershell script. This results in the following memory graph: image

The big differences seem to be in the gen 1 and 2 heaps, although taking memory snapshots doesn't really reveal anything obvious. The total memory also grows over time when the column is there.

While it's easy enough to say, just decrease the model size, I am working with a large, legacy, model and cannot make big changes like that. I would also like the know the underlying reason why the memory behaviour changes so drastically just by changing one column. I would expect if the setup is wrong for it to happen all the time.

This also occurs in .Net 7 and EFCore 7 versions, on windows and linux.

Please let me know if there's any more information to supply.

MemoryTest.zip

Include provider and version information

EF Core version: 8.0.8 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .Net 8 Operating system: Windows

Steve887 avatar Aug 26 '24 04:08 Steve887

Before taking a look at your repro, it's well-known that SqlClient has some severe memory/performance issues with reading large binary columns asynchronously (https://github.com/dotnet/SqlClient/issues/593), so that could explain the memory behavior when adding your "image" property. This should be easily verifiable by switching to sync I/O (SaveChanges() instead of SaveChangesAsync()) as a test - can you please do that?

roji avatar Aug 26 '24 07:08 roji

Before taking a look at your repro, it's well-known that SqlClient has some severe memory/performance issues with reading large binary columns asynchronously (dotnet/SqlClient#593), so that could explain the memory behavior when adding your "image" property. This should be easily verifiable by switching to sync I/O (SaveChanges() instead of SaveChangesAsync()) as a test - can you please do that?

It's not a binary column, it's just a 50 character string. It also doesn't have any data in it. image

Steve887 avatar Aug 26 '24 08:08 Steve887

@Steve887 Do you see the same behavior if you change the query to be no-tracking? For example:

return await _context.Set<VisitView>()
	.Include(x => x.ConsultationViews).ThenInclude(x => x.ConsultationItems).ThenInclude(x => x.Item)
	.AsNoTracking()
	.FirstOrDefaultAsync(p => p.VisitNumber == key);

ajcvickers avatar Aug 27 '24 11:08 ajcvickers

@ajcvickers Hi, tried adding this and memory behaviour is similar. In fact total memory is actually higher. image

Steve887 avatar Aug 28 '24 01:08 Steve887

@ajcvickers is there any more information I can provide for this one?

Steve887 avatar Sep 02 '24 04:09 Steve887

try using splitquery , might works

return await _context.Set<VisitView>() .Include(x => x.ConsultationViews).ThenInclude(x => x.ConsultationItems).ThenInclude(x => x.Item) .AsSplitQuery() .FirstOrDefaultAsync(p => p.VisitNumber == key);

satviktechie1986 avatar Sep 12 '24 10:09 satviktechie1986

@satviktechie1986 Setting AsSplitQuery does result in normal memory usage. However, in our actual app this isn't really a good solution as we have many queries across the app, and we don't want the dramatic increase in network calls enabling this setting globally would result in.

I would still like to find out why the original query has such a dramatic difference by simply including one extra property, so I can take those findings and implement in our main application.

Steve887 avatar Sep 13 '24 05:09 Steve887

alternative you can use

return await _context.Set<VisitView>() .AsNoTracking() .Where(p => p.VisitNumber == key) .Select(v => new { v.VisitNumber, ConsultationViews = v.ConsultationViews.Select(cv => new { cv.Id, ConsultationItems = cv.ConsultationItems.Select(ci => new { ci.Id, ci.Item.Name }) }) }) .FirstOrDefaultAsync();

satviktechie1986 avatar Sep 13 '24 05:09 satviktechie1986

@satviktechie1986 again, this wouldn't work for our actual app as we have too many queries to realistically select just the required columns

Steve887 avatar Sep 13 '24 06:09 Steve887

@Steve887 apologies for taking so long to look into this.

Unfortunately, I can't reproduce an issue here. Past warmup, the program seems completely stable at around ~50MB - see the following memory profiler screenshot:

Image

Now, it's important to note that my database is empty, since you haven't provided any data or guidance on what to generate; performance would likely be very different if the database was populated. Specifically, your query include two collection includes, which are known to cause potential perf issues, depending on the kind of data you have in your database (docs).

In any case, I'm not seeing anything specific in your OnConfiguring that looks like it would lead to problematic memory usage. If you're still seeing issues, you'll have to provide a repro that allows me to see the actual problem on my end.

roji avatar Dec 15 '24 21:12 roji

@roji I'm overseas at the moment and can't double check, but I'm sure my original repro zip file had seed data in it that would setup a sql local dB with the appropriate data to reproduce the issue. If not, I will have to update it in a couple weeks when I am back home.

I will also test this in dotnet and efcore 9 now they are in public release to see if the issue still occurs.

Steve887 avatar Dec 15 '24 23:12 Steve887

@Steve887 apologies, I made some simplifications yesterday when trying out your repro, which made the memory usage go away. And you're also right - your migrations do include some seed data.

A requirement of our system is to change connection strings at runtime, so instead of using Services.AddDbContext, I am registering the data context with Autofac and passing in a connection string at runtime, then using an overridden OnConfiguring to setup the SqlServer provider. If I change this to use Services.AddDbContext then memory does return the normal, but I'm confused why this would only seem to have an effect at a large number of properties.

After some digging, I can see that when using OnConfiguring, you happen to call EnableRetryingStrategy, but not in the AddDbContext variant (which doesn't have the high memory requirements). Once I remove that line from OnConfiguring (or add it to AddDbContxt), memory pressure is the same: high when a retrying strategy is used, low otherwise.

In other words, this looks like a case of internal buffering being done by EF - see these docs for more information. I'm assuming that the collection includes cause quite a bit of data to be returned, and then the additional buffering caused by EnableRetryingStrategy is the cause of the increased memory pressure. The more fields you have on your query's principal entity type (VisitView), the more duplication occurs because of the collection include, and then that's made worse by the buffering. This is also why split query mitigates the issue, by removing the duplication.

Since the situation seems like it's clear and (currently) by design, and you're overseas for the next few weeks, I'll go ahead and close this issue for now. But when you're back, definitely take a look and report back if the above corresponds with what you're seeing. If not, I can reopen the issue and investigate further.

roji avatar Dec 16 '24 08:12 roji