AKS icon indicating copy to clipboard operation
AKS copied to clipboard

SQL Server detects physical RAM rather than pod limit in v1.25.5

Open adoprog opened this issue 2 years ago • 24 comments

Describe the bug We are running the same workloads on multiple clusters, most of them being 1.24.6 and the newest one on 1.25.5 (was 1.25.4 earlier today, but still the same issue).

The SQL Server (mcr.microsoft.com/mssql/server:2019-latest) reports the following message on startup: 2023-02-06 22:00:36.56 Server Detected **51449** MB of RAM. This is an informational message; no user action is required. and later on fails with OOM error, even though the RAM limit is set to "4Gi"

To Reproduce

  1. Deploy SQL Server to the cluster v1.25.5
  2. Run a heavy workload that uses RAM, i.e. install DACPAC on SQL.

Expected behavior SQL detects proper amount of RAM

Screenshots image image

Environment (please complete the following information):

  • Kubernetes version: 1.25.5

adoprog avatar Feb 06 '23 22:02 adoprog

Action required from @Azure/aks-pm

ghost avatar Mar 09 '23 01:03 ghost

I've pinged the SQL team: https://github.com/microsoft/mssql-docker/issues/814

justindavies avatar Mar 10 '23 19:03 justindavies

Thanks. It works fine in AKS clusters on 1.24.6, so could be an issue in AKS, not the SQL.

adoprog avatar Mar 13 '23 17:03 adoprog

This seems to be related: https://github.com/Azure/AKS/issues/3443#issuecomment-1471746964

This might help some of you, Kubernetes 1.25 included an update to use cgroups v2 api (cgroups is basically how Kubernetes passes settings to the containers)...

adoprog avatar Mar 20 '23 19:03 adoprog

I too am having this issue on microk8s 1.25 and 1.26. Any resolutions?

brewsteropsdev avatar Apr 06 '23 16:04 brewsteropsdev

Action required from @Azure/aks-pm

ghost avatar May 06 '23 19:05 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar May 22 '23 00:05 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Jun 06 '23 00:06 ghost

Any updates on this? We are running into the same problem after upgrading our AKS cluster to version 1.25.6

dinilimento avatar Jun 08 '23 15:06 dinilimento

Issue needing attention of @Azure/aks-leads

ghost avatar Jun 23 '23 18:06 ghost

This issue has to be solved by the SQL Server team. SQL Server does not take the changes of cgroup v1 vs cgroup v2 changes into account. On cgroup v1 sql server works fine. WIth cgroup v2 it fails to limit itself to the resources available.

mdebruijnbf avatar Jun 26 '23 05:06 mdebruijnbf

This issue has to be solved by the SQL Server team. SQL Server does not take the changes of cgroup v1 vs cgroup v2 changes into account. On cgroup v1 sql server works fine. WIth cgroup v2 it fails to limit itself to the resources available.

mdebruijnbf avatar Jun 26 '23 05:06 mdebruijnbf

In Ubuntu 20.x cgroup v1 is supported. In Ubuntu 22.x cgroup v2 is supported. cgroup v2 is a breaking change. (cgroup configures, among others, the maximum memory a proces is allowed to use)

After upgrading AKS to 1.25+ the AKS hosts are using cgroup v2 (https://learn.microsoft.com/en-us/azure/aks/supported-kubernetes-versions?tabs=azure-cli#aks-components-breaking-changes-by-version)

Sql server is currently (june 2023) based in Ubuntu 20.x and is not supporting cgroup v2. So running a sql server container in AKS1.25+ is causing problems which can only be solved by Microsoft with upgrading Sql Server to Ubuntu 22+

I've created a temporary workaround to prevent OOMKilled errors on AKS 1.25 by adding a /var/opt/mssql/mssql.conf file containing the max memory sql is allowed to use. Example:

[memory]
memorylimitmb = 8096
_cgroupmax=$(cat /sys/fs/cgroup/memory.max)
_max_memory_in_mb=$(((_cgroupmax/1024/1024)/10*8))
echo "[memory]" > /var/opt/mssql/mssql.conf
echo "memorylimitmb = $_max_memory_in_mb" >> /var/opt/mssql/mssql.conf

With cgroup v2 the file /sys/fs/cgroup/memory.max contains the maximum memory a process may use. (the math: sql may only use 80% of the available memory. This math is used by Microsoft in cgroup v1)

How you add this conf file depends on your situation. We build our own sql server container based on the mssql containers and run this commands during startup of the container before sql server starts.

mdebruijnbf avatar Jun 28 '23 15:06 mdebruijnbf

Issue needing attention of @Azure/aks-leads

ghost avatar Jul 13 '23 18:07 ghost

Issue needing attention of @Azure/aks-leads

ghost avatar Jul 28 '23 18:07 ghost

@Azure/aks-leads Any updates?

martijndebruijn avatar May 21 '24 05:05 martijndebruijn

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads

Issue needing attention of @Azure/aks-leads