opentelemetry-collector
opentelemetry-collector copied to clipboard
use hierarchical_memory_limit instead of limit_in_bytes with cgroup v1
Signed-off-by: Nicholaswang [email protected]
Fixing a bug:
When using limit_percentage config for memory_limiter, it can get an incorrect limit value since the memory.limit_in_bytes is set to unlimited, and it gets the memory limit of the container node:

the container node meminfo:

configure limit_percentage to 75 and spike_limit_percentage to 20:

with the latest code, it will get the memory limit of the node which is 192658Mb of memory limit:

after fixing: use hierarchical_memory_limit instead of limit_in_bytes with cgroup v1, it will get the container memory limit: 4096Mb:

this is a common issue for projects using cgroups memory limit, such as https://github.com/VictoriaMetrics/VictoriaMetrics/issues/84
VictoriaMetrics also fix this issue with hierarychical_memory_limit in memory.stat
Link to tracking Issue: https://github.com/open-telemetry/opentelemetry-collector/issues/4972
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: Nicholaswang / name: nicholaswang (9ed09c9af00b1158288ab7eb5249af6a2005866e)
Sure
@dashpole unit tests have been extented.
@dashpole does this LGTM?
Yes, it lgtm
Codecov Report
Merging #4973 (d273cb6) into main (e092fc7) will increase coverage by
0.00%. The diff coverage is92.50%.
@@ Coverage Diff @@
## main #4973 +/- ##
=======================================
Coverage 92.30% 92.31%
=======================================
Files 212 212
Lines 13263 13303 +40
=======================================
+ Hits 12243 12280 +37
- Misses 803 805 +2
- Partials 217 218 +1
| Impacted Files | Coverage Δ | |
|---|---|---|
| internal/iruntime/total_memory_linux.go | 40.00% <40.00%> (ø) |
|
| internal/cgroups/cgroup.go | 92.50% <100.00%> (+3.61%) |
:arrow_up: |
| internal/cgroups/cgroups.go | 90.52% <100.00%> (+2.85%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Hi, @bogdandrutu @dashpole The pr is still open, do I need to increase unittest coverage?
Can you rebase? It looks like some tests are failing, and those may have been fixed.
PR rebased.
Hi, @dashpole @jpkrohling Do I need to fix the failing tasks before merged?
Yes. If they are caused by intermittent tests, we can restart, but it looks like you need to improve the test coverage (if possible) and add a changelog entry. If you don't think a changelog entry is required for this change and the reviewer agrees with you, I can add a label to skip this check.
ok, I will improve the coverage
This PR was marked stale due to lack of activity. It will be closed in 14 days.
@jpkrohling added two more cases to imprive coverage.
@dashpole @jpkrohling Hi, seems like it needs to be retriggered
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
@dashpole, @dmitryax, do you know what's pending for this PR?
It looks like it somehow includes a bunch of unrelated commits. I was on board with the original set of changes, but the PR contents has to be fixed first.
@Nicholaswang, are you still interested in working on this one?
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Closed as inactive. Feel free to reopen if this PR is still being worked on.