Docker-Provider icon indicating copy to clipboard operation
Docker-Provider copied to clipboard

OMS Agent high memory usage

Open MrImpossibru opened this issue 2 years ago • 21 comments

Hello,

Is there any way to reduce omsagent memory consumption in the kubernetes cluster? Just for 2 nodes it runs 3 instances of omsagent (1 daemon set - 2 instances, 1 replica set - 1 instance) and each instance uses 300mb of ram. This is the most demanding service in my cluster and it is just a monitoring tool.

Why is replicaset even required? Just adds one more instance to the node where daemonset already created one.

Reopening because of last Issue was closed without a solution. (With a solution for the problem what happened later than original issue was created) https://github.com/microsoft/Docker-Provider/issues/694

MrImpossibru avatar Jul 26 '22 09:07 MrImpossibru

Hi, @MrImpossibru , replicaset (is singleton pod) for the cluster level monitoring information such as the data collected in KubePodInventory, KubeNodeInventory and KubeEvents etc.

ganga1980 avatar Aug 02 '22 03:08 ganga1980

Hi, @MrImpossibru , replicaset (is singleton pod) for the cluster level monitoring information such as the data collected in KubePodInventory, KubeNodeInventory and KubeEvents etc.

Hi, @ganga1980, Is it possible to let first/any/all daemonset pod(s) to do that? Is it possible to prevent omsagent to reserve so big ram amount?

MrImpossibru avatar Aug 08 '22 06:08 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Aug 15 '22 10:08 github-actions[bot]

up

MrImpossibru avatar Aug 16 '22 20:08 MrImpossibru

@ganga1980 Update: Prometheus agent uses 20mb ram after update. But it makes a reservation for 225mb (required ram amount for a pod). Is it possible to do something with that?

P.S. 9th of August there was some update for the Kubernetes after which more ram were reserved / less amount became available. Haven't figured out yet which service caused this.

MrImpossibru avatar Aug 17 '22 05:08 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Aug 25 '22 10:08 github-actions[bot]

Up

MrImpossibru avatar Aug 25 '22 11:08 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Sep 03 '22 10:09 github-actions[bot]

up

MrImpossibru avatar Sep 03 '22 11:09 MrImpossibru

up

MrImpossibru avatar Sep 10 '22 10:09 MrImpossibru

@ganga1980 Update: Prometheus agent uses 20mb ram after update. But it makes a reservation for 225mb (required ram amount for a pod). Is it possible to do something with that?

P.S. 9th of August there was some update for the Kubernetes after which more ram were reserved / less amount became available. Haven't figured out yet which service caused this.

In the current semester, we have plan to integrate the vertical pod auto-scalar for scaling of both requests and limits. With that, you will have requests and limits will be 20MB and this should address your ask. Regarding the perf, we are continue improving on it and this will be ongoing improvements. Let me know if you have any other follow up questions and otherwise we can close.

ganga1980 avatar Sep 15 '22 03:09 ganga1980

@ganga1980 only after this is fixed I will close the topic. And the main problem is in the first message - omsagent uses too much memory + there're many instances of it per cluster. What makes 4gb ram VMs in the cluster almost unusable - default pods are using almost all Ram in such VMs.

MrImpossibru avatar Sep 15 '22 06:09 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Sep 22 '22 10:09 github-actions[bot]

Up

MrImpossibru avatar Sep 22 '22 11:09 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Sep 30 '22 10:09 github-actions[bot]

Up

MrImpossibru avatar Sep 30 '22 11:09 MrImpossibru

@MrImpossibru, as I mentioned earlier Perf improvements will be ongoing. since you are referring perf usage of 4GB across all the default pods which are owned by different teams which needs to be followed up separately.

ganga1980 avatar Oct 04 '22 01:10 ganga1980

@ganga1980 OMS Agent is the most memory consuming service. And I found out that it's memory consumption depends on the node ram amount. Looks like it reserves some % out of the total ram in the node.

MrImpossibru avatar Oct 04 '22 12:10 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Oct 12 '22 10:10 github-actions[bot]

Up

MrImpossibru avatar Oct 12 '22 10:10 MrImpossibru

Up

MrImpossibru avatar Oct 20 '22 06:10 MrImpossibru

@ganga1980 OMS Agent is the most memory consuming service. And I found out that it's memory consumption depends on the node ram amount. Looks like it reserves some % out of the total ram in the node.

@MrImpossibru , Memory consumption not based on the node's available memory rather its based on data collection. We use open source products like Telegraf, Fluent-bit and Fluentd etc. in our agent and these processes are consuming this memory. For example, if the cluster has lot of the k8s resources, then replicaset pod has to fetch and parse, and these resources data similarly if the node has high volume of the container logs then daemonset consumes the required memory for container logs processing. If you are available, lets have quick discussion on this and see how can we help on this.

ganga1980 avatar Oct 26 '22 02:10 ganga1980

Adding an Up for @MrImpossibru , omsagent is by far the largest container running on our AKS too.

neilrees avatar Oct 26 '22 11:10 neilrees

Memory consumption not based on the node's available memory rather its based on data collection

Sorry, was wrong. Checked it again using 2 fresh clusters with different node sizes - consumption was almost equal.

MrImpossibru avatar Oct 28 '22 20:10 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Nov 05 '22 10:11 github-actions[bot]

Up

MrImpossibru avatar Nov 05 '22 21:11 MrImpossibru

up

MrImpossibru avatar Nov 11 '22 16:11 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Nov 19 '22 10:11 github-actions[bot]

Up

MrImpossibru avatar Nov 25 '22 07:11 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Dec 03 '22 10:12 github-actions[bot]