snowflake-connector-net icon indicating copy to clipboard operation
snowflake-connector-net copied to clipboard

OOM on AKS (docker container) in version above 1.2.6

Open ppiwow-apay opened this issue 2 years ago • 9 comments

Issue description

On the local machine all works fine on each version, but when we deploy it to docker image and release it to AKS new SnowflakeConnection causes OOM on POD. From our investigation it seems na ctor of this class allocates huge amount of memory.

Does this connector support usage on unix docker image?

All works fine till version 1.2.6.

Both net 5 and net 6 have this issue. If you need any additional information let me know as there

ppiwow-apay avatar Mar 25 '22 07:03 ppiwow-apay

We are running into this issue as well. Had to revert to an older version of the driver. Very frustrating.

jimhoonan avatar Aug 19 '22 19:08 jimhoonan

any information? are you going to look at it at all?

ppiwow-apay avatar Sep 20 '22 09:09 ppiwow-apay

@ppiwow-apay None of the tests cover AKS, but I don't think that should be relevant. If there is a memory leak then that should have surfaced on other environments. Do you happen to have a heap dump that you can share exhibiting the memory leak you're reporting?

sfc-gh-wfateem avatar Sep 20 '22 14:09 sfc-gh-wfateem

@ppiwow-apay I also recognize that we have been late in responding to you, so sorry about that. Given that your post was back in March, can you clarify what version of the .NET driver you were using at the time and have you been able to reproduce that same behavior with the latest version?

sfc-gh-wfateem avatar Sep 20 '22 15:09 sfc-gh-wfateem

The last I tried was the newest version, I could try once more but need some time to update and get memory dumps. all version over 1.2.6 (we updated from time to time) was affected.

locally on the machine or locally on docker all seems to work fine, and only on aks does this issue occur.

ppiwow-apay avatar Sep 21 '22 07:09 ppiwow-apay

@sfc-gh-jfan We have seen some similar behavior. I have been testing this recently with v2.0.16 and Dapper 2.0.123 on dotnet 6.

Our simple app spins up 20 tasks and runs this in a loop. When using dapper, the memory usage of the app running in k8s more than doubled over 6 hours.

public class SnowflakeRepository : IRepository
{
    public async Task<IList<Entity>> GetEntities(long entityId, CancellationToken token)
    {
        await using var conn = new SnowflakeDbConnection();
        conn.ConnectionString = "account=***;user=***;password=***;db=***;role=***;warehouse=***;schema=***";
        await conn.OpenAsync(token);
        
        var query = "select * from presentation.vw_entities where EntityId = :entityId;";

        var entityResults = await conn.QueryAsync<Entity>(query, new { entityId });
                
        return entityResults.ToList();
    }
}

jblackburn21 avatar Sep 21 '22 20:09 jblackburn21

Hi, in our tests we recognized that a memory leak was on calling ctor:

new SnowflakeDbConnection();

we made tests with only creating new connection and it was enough.

ppiwow-apay avatar Sep 28 '22 07:09 ppiwow-apay

I can confirm what @ppiwow-apay is reporting. We trimmed down our sample to remove dapper and use a minimal setup, and am seeing the same behavior.

jblackburn21 avatar Sep 28 '22 15:09 jblackburn21

I am seeing the exact same behavior. Stripped down the application to the basics and seeing the same memory issues.

mattcalt avatar Oct 11 '22 14:10 mattcalt

any news on that?

ppiwow-apay avatar Dec 05 '22 13:12 ppiwow-apay

@sfc-gh-igarish any news on that?

sfc-gh-jtang avatar May 25 '23 17:05 sfc-gh-jtang

There are some questions: Looks like their garbage collection didn't work as usual. Can the customer check their garbage collector setting? And do you know how many free memory do they have inside that decker?

I found something maybe related to this OOM issue.

https://github.com/dotnet/runtime/issues/58974

On this link, they said .net 5 and 6 for workstation GC are all pass, but they are failed on Server GC, so the customer may have the same issue. also it said for .net 5 or higher, they need more heap,

for example, the following heap count is 10, they may have to increase this.

and it said they need 3190MB to get the GC pass.

I hope this can help.

At the same time we continue looking into it.

sfc-gh-igarish avatar May 26 '23 22:05 sfc-gh-igarish

@sfc-gh-igarish - me and @ppiwow-apay work together.

We're using workstation GC everywhere, so it's not an issue with server GC. Various services have different RAM limits, but overall it's in the 256-512 MiB range.

Please keep in mind that 1.2.6 keeps working for us (no issues), but once we move forward with just this package (same runtime image), OOMs happen, so I wouldn't look for blame in runtime.

edit: I don't think this was stated anywhere earlier plainly, but we're running the containers on Azure Kubernetes Service, on linux worker nodes. Base image is mcr.microsoft.com/dotnet/aspnet:6.0 so it's Debian 11 AMD64 OS.

amis92 avatar Jun 01 '23 12:06 amis92

There's a repro repository I've made 2yrs ago: https://github.com/amis92/net-snowflake-memoryleak

amis92 avatar Jun 01 '23 13:06 amis92

I have ran the docker images based on repro repository locally and found no big differences between several driver versions. The 2.0.25 driver version was ran on .NET6, whereas rest of the tests on .NET5.

It seems to be rather consistent:

Using snowflake Snowflake.Data, Version=1.2.4.0, Culture=neutral, PublicKeyToken=null
Opening connection
- PrivateMemory 55.46 MB
- WorkingSet 24.41 MB
- ManagedMemory 0.10 MB
Building command
- PrivateMemory 119.64 MB
- WorkingSet 62.91 MB
- ManagedMemory 0.93 MB
Executing command
- PrivateMemory 119.70 MB
- WorkingSet 62.91 MB
- ManagedMemory 0.94 MB
Executed command
- PrivateMemory 120.65 MB
- WorkingSet 65.00 MB
- ManagedMemory 1.20 MB
Connection disposed.
- PrivateMemory 128.88 MB
- WorkingSet 65.00 MB
- ManagedMemory 1.26 MB

Using snowflake Snowflake.Data, Version=1.2.6.0, Culture=neutral, PublicKeyToken=null
Opening connection
- PrivateMemory 55.46 MB
- WorkingSet 24.31 MB
- ManagedMemory 0.10 MB
Building command
- PrivateMemory 119.84 MB
- WorkingSet 64.17 MB
- ManagedMemory 0.98 MB
Executing command
- PrivateMemory 119.89 MB
- WorkingSet 64.17 MB
- ManagedMemory 0.99 MB
Executed command
- PrivateMemory 122.94 MB
- WorkingSet 68.08 MB
- ManagedMemory 1.38 MB
Connection disposed.
- PrivateMemory 122.99 MB
- WorkingSet 68.30 MB
- ManagedMemory 1.44 MB

Using snowflake Snowflake.Data, Version=2.0.3.0, Culture=neutral, PublicKeyToken=null
Opening connection
- PrivateMemory 55.47 MB
- WorkingSet 24.27 MB
- ManagedMemory 0.10 MB
Building command
- PrivateMemory 131.25 MB
- WorkingSet 66.73 MB
- ManagedMemory 0.98 MB
Executing command
- PrivateMemory 131.29 MB
- WorkingSet 66.73 MB
- ManagedMemory 0.99 MB
Executed command
- PrivateMemory 132.24 MB
- WorkingSet 68.83 MB
- ManagedMemory 1.38 MB
Connection disposed.
- PrivateMemory 132.30 MB
- WorkingSet 68.83 MB
- ManagedMemory 1.44 MB

Using snowflake Snowflake.Data, Version=2.0.10.0, Culture=neutral, PublicKeyToken=null
Opening connection
- PrivateMemory 55.47 MB
- WorkingSet 24.38 MB
- ManagedMemory 0.10 MB
Building command
- PrivateMemory 131.71 MB
- WorkingSet 68.16 MB
- ManagedMemory 1.04 MB
Executing command
- PrivateMemory 131.75 MB
- WorkingSet 68.16 MB
- ManagedMemory 1.05 MB
Executed command
- PrivateMemory 135.72 MB
- WorkingSet 73.07 MB
- ManagedMemory 1.44 MB
Connection disposed.
- PrivateMemory 135.81 MB
- WorkingSet 73.30 MB
- ManagedMemory 1.49 MB

Using snowflake Snowflake.Data, Version=2.0.25.0, Culture=neutral, PublicKeyToken=null
Opening connection
- PrivateMemory 65.33 MB
- WorkingSet 22.90 MB
- ManagedMemory 0.09 MB
Building command
- PrivateMemory 135.98 MB
- WorkingSet 67.18 MB
- ManagedMemory 1.16 MB
Executing command
- PrivateMemory 136.00 MB
- WorkingSet 67.18 MB
- ManagedMemory 1.18 MB
Executed command
- PrivateMemory 137.11 MB
- WorkingSet 69.11 MB
- ManagedMemory 1.64 MB
Connection disposed.
- PrivateMemory 137.11 MB
- WorkingSet 69.11 MB
- ManagedMemory 1.64 MB

The OOM in AKS may be related to some kind of unexpected behavior within the dotnet runtime even when using the Workstation GC as pointed by the issue in dotnet/runtime#49317 which indicates that the GC does not work as expected by the user.

One of the comments points that switching to the alpine version of the dotnet image fixed their problem whereas other comment advised to tinker with the GC settings.

I hope this can help you with your issue.

sfc-gh-pbulawa avatar Jul 20 '23 09:07 sfc-gh-pbulawa

hey all - seems like the issue is not reproducible with the recent versions of the Snowflake .NET driver and also seems to be related to an unexpected behaviour of the runtime itself.

therefore i'm now marking this issue as closed, but please feel free to comment if you have a reproduction scenario which proves evidence for bug/unexpected behaviour in the recent versions of Snowflake .NET driver and then I'll reopen and we can continue troubleshooting.

sfc-gh-dszmolka avatar Jul 24 '23 08:07 sfc-gh-dszmolka