opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

use hierarchical_memory_limit instead of limit_in_bytes with cgroup v1

Open Nicholaswang opened this issue 3 years ago • 19 comments

Signed-off-by: Nicholaswang [email protected]

Fixing a bug: When using limit_percentage config for memory_limiter, it can get an incorrect limit value since the memory.limit_in_bytes is set to unlimited, and it gets the memory limit of the container node: image

the container node meminfo: 企业微信截图_16467551589129

configure limit_percentage to 75 and spike_limit_percentage to 20: 企业微信截图_16467539509973

with the latest code, it will get the memory limit of the node which is 192658Mb of memory limit: 企业微信截图_1646753927335

after fixing: use hierarchical_memory_limit instead of limit_in_bytes with cgroup v1, it will get the container memory limit: 4096Mb: 企业微信截图_16467541014594

this is a common issue for projects using cgroups memory limit, such as https://github.com/VictoriaMetrics/VictoriaMetrics/issues/84 VictoriaMetrics also fix this issue with hierarychical_memory_limit in memory.stat

Link to tracking Issue: https://github.com/open-telemetry/opentelemetry-collector/issues/4972

Nicholaswang avatar Mar 08 '22 16:03 Nicholaswang

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: Nicholaswang / name: nicholaswang (9ed09c9af00b1158288ab7eb5249af6a2005866e)

Sure

Nicholaswang avatar Mar 23 '22 15:03 Nicholaswang

@dashpole unit tests have been extented.

Nicholaswang avatar Mar 25 '22 02:03 Nicholaswang

@dashpole does this LGTM?

bogdandrutu avatar Apr 07 '22 14:04 bogdandrutu

Yes, it lgtm

dashpole avatar Apr 07 '22 14:04 dashpole

Codecov Report

Merging #4973 (d273cb6) into main (e092fc7) will increase coverage by 0.00%. The diff coverage is 92.50%.

@@           Coverage Diff           @@
##             main    #4973   +/-   ##
=======================================
  Coverage   92.30%   92.31%           
=======================================
  Files         212      212           
  Lines       13263    13303   +40     
=======================================
+ Hits        12243    12280   +37     
- Misses        803      805    +2     
- Partials      217      218    +1     
Impacted Files Coverage Δ
internal/iruntime/total_memory_linux.go 40.00% <40.00%> (ø)
internal/cgroups/cgroup.go 92.50% <100.00%> (+3.61%) :arrow_up:
internal/cgroups/cgroups.go 90.52% <100.00%> (+2.85%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Apr 07 '22 15:04 codecov[bot]

Hi, @bogdandrutu @dashpole The pr is still open, do I need to increase unittest coverage?

Nicholaswang avatar Apr 20 '22 12:04 Nicholaswang

Can you rebase? It looks like some tests are failing, and those may have been fixed.

dashpole avatar Apr 20 '22 13:04 dashpole

PR rebased.

jpkrohling avatar Apr 20 '22 13:04 jpkrohling

Hi, @dashpole @jpkrohling Do I need to fix the failing tasks before merged?

Nicholaswang avatar Apr 24 '22 02:04 Nicholaswang

Yes. If they are caused by intermittent tests, we can restart, but it looks like you need to improve the test coverage (if possible) and add a changelog entry. If you don't think a changelog entry is required for this change and the reviewer agrees with you, I can add a label to skip this check.

jpkrohling avatar Apr 25 '22 14:04 jpkrohling

ok, I will improve the coverage

Nicholaswang avatar Apr 27 '22 03:04 Nicholaswang

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar May 11 '22 03:05 github-actions[bot]

@jpkrohling added two more cases to imprive coverage.

Nicholaswang avatar May 16 '22 08:05 Nicholaswang

@dashpole @jpkrohling Hi, seems like it needs to be retriggered

Nicholaswang avatar May 19 '22 02:05 Nicholaswang

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jun 22 '22 03:06 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jul 29 '22 03:07 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Aug 19 '22 03:08 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Sep 09 '22 03:09 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Sep 24 '22 04:09 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Oct 10 '22 04:10 github-actions[bot]

@dashpole, @dmitryax, do you know what's pending for this PR?

jpkrohling avatar Oct 18 '22 19:10 jpkrohling

It looks like it somehow includes a bunch of unrelated commits. I was on board with the original set of changes, but the PR contents has to be fixed first.

dashpole avatar Oct 18 '22 19:10 dashpole

@Nicholaswang, are you still interested in working on this one?

jpkrohling avatar Oct 19 '22 19:10 jpkrohling

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Nov 03 '22 03:11 github-actions[bot]

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions[bot] avatar Nov 22 '22 03:11 github-actions[bot]