Docker-Provider OMS Agent high memory usage

Hello,

Is there any way to reduce omsagent memory consumption in the kubernetes cluster? Just for 2 nodes it runs 3 instances of omsagent (1 daemon set - 2 instances, 1 replica set - 1 instance) and each instance uses 300mb of ram. This is the most demanding service in my cluster and it is just a monitoring tool.

Why is replicaset even required? Just adds one more instance to the node where daemonset already created one.

Reopening because of last Issue was closed without a solution. (With a solution for the problem what happened later than original issue was created) https://github.com/microsoft/Docker-Provider/issues/694

Jul 26 '22 09:07 MrImpossibru

Hi, @MrImpossibru , replicaset (is singleton pod) for the cluster level monitoring information such as the data collected in KubePodInventory, KubeNodeInventory and KubeEvents etc.

Aug 02 '22 03:08 ganga1980

Hi, @MrImpossibru , replicaset (is singleton pod) for the cluster level monitoring information such as the data collected in KubePodInventory, KubeNodeInventory and KubeEvents etc.

Hi, @ganga1980, Is it possible to let first/any/all daemonset pod(s) to do that? Is it possible to prevent omsagent to reserve so big ram amount?

Aug 08 '22 06:08 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Aug 15 '22 10:08 github-actions[bot]

up

Aug 16 '22 20:08 MrImpossibru

@ganga1980 Update: Prometheus agent uses 20mb ram after update. But it makes a reservation for 225mb (required ram amount for a pod). Is it possible to do something with that?

P.S. 9th of August there was some update for the Kubernetes after which more ram were reserved / less amount became available. Haven't figured out yet which service caused this.

Aug 17 '22 05:08 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Aug 25 '22 10:08 github-actions[bot]

Up

Aug 25 '22 11:08 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Sep 03 '22 10:09 github-actions[bot]

up

Sep 03 '22 11:09 MrImpossibru

up

Sep 10 '22 10:09 MrImpossibru

@ganga1980 Update: Prometheus agent uses 20mb ram after update. But it makes a reservation for 225mb (required ram amount for a pod). Is it possible to do something with that?

P.S. 9th of August there was some update for the Kubernetes after which more ram were reserved / less amount became available. Haven't figured out yet which service caused this.

In the current semester, we have plan to integrate the vertical pod auto-scalar for scaling of both requests and limits. With that, you will have requests and limits will be 20MB and this should address your ask. Regarding the perf, we are continue improving on it and this will be ongoing improvements. Let me know if you have any other follow up questions and otherwise we can close.

Sep 15 '22 03:09 ganga1980

@ganga1980 only after this is fixed I will close the topic. And the main problem is in the first message - omsagent uses too much memory + there're many instances of it per cluster. What makes 4gb ram VMs in the cluster almost unusable - default pods are using almost all Ram in such VMs.

Sep 15 '22 06:09 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Sep 22 '22 10:09 github-actions[bot]

Up

Sep 22 '22 11:09 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Sep 30 '22 10:09 github-actions[bot]

Up

Sep 30 '22 11:09 MrImpossibru

@MrImpossibru, as I mentioned earlier Perf improvements will be ongoing. since you are referring perf usage of 4GB across all the default pods which are owned by different teams which needs to be followed up separately.

Oct 04 '22 01:10 ganga1980

@ganga1980 OMS Agent is the most memory consuming service. And I found out that it's memory consumption depends on the node ram amount. Looks like it reserves some % out of the total ram in the node.

Oct 04 '22 12:10 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Oct 12 '22 10:10 github-actions[bot]

Up

Oct 12 '22 10:10 MrImpossibru

Up

Oct 20 '22 06:10 MrImpossibru

@ganga1980 OMS Agent is the most memory consuming service. And I found out that it's memory consumption depends on the node ram amount. Looks like it reserves some % out of the total ram in the node.

@MrImpossibru , Memory consumption not based on the node's available memory rather its based on data collection. We use open source products like Telegraf, Fluent-bit and Fluentd etc. in our agent and these processes are consuming this memory. For example, if the cluster has lot of the k8s resources, then replicaset pod has to fetch and parse, and these resources data similarly if the node has high volume of the container logs then daemonset consumes the required memory for container logs processing. If you are available, lets have quick discussion on this and see how can we help on this.

Oct 26 '22 02:10 ganga1980

Adding an Up for @MrImpossibru , omsagent is by far the largest container running on our AKS too.

Oct 26 '22 11:10 neilrees

Memory consumption not based on the node's available memory rather its based on data collection

Sorry, was wrong. Checked it again using 2 fresh clusters with different node sizes - consumption was almost equal.

Oct 28 '22 20:10 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Nov 05 '22 10:11 github-actions[bot]

Up

Nov 05 '22 21:11 MrImpossibru

up

Nov 11 '22 16:11 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Nov 19 '22 10:11 github-actions[bot]

Up

Nov 25 '22 07:11 MrImpossibru

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Dec 03 '22 10:12 github-actions[bot]

Docker-Provider Docker-Provider copied to clipboard

OMS Agent high memory usage

Docker-Provider
Docker-Provider copied to clipboard