mesos
mesos copied to clipboard
Expose new metrics for memory usage in the container.
The metric "mem_kmem_usage_bytes" is the total kernel memory usage by processes in the cgroup in bytes.
The metric "mem_kmem_tcp_usage_bytes" is the total memory usage for TCP buffers in bytes.
I don't think we turn on kmem accounting yet in Mesos containerizer.
I remember in old kernels, you need to set kmem.limit_in_bytes once to enable kmem accounting. I am not sure about the new kernel behavior.
Also, we need to be careful on bugs like this on old kernels if we turn on kmem accounting https://github.com/opencontainers/runc/issues/1725
We've been running with this patch at Twitter since February and we're getting kmem metrics.
The mentioned bug is interesting, we've only been running with 4.9 and later.
I am not sure if the kernel behavior has changed or not in newer kernels. For example: https://github.com/opencontainers/runc/blob/7139b61f7fdb904d0acb8db825709aa8d2d2ef36/libcontainer/cgroups/fs/memory.go#L70
You'll have to write memory.kmem.limit_in_bytes to enable kmem accounting
So if kmem accounting is not enabled, i don't know what will happen if you read the data from memory.kmem.usage_in_bytes. Let me do some testing on my CentOS 7 default kernel (3.10)
Looks like on CentOS 7 (3.10.0-693.5.2.el7.x86_64). If kmem accounting is not enabled, the memory.kmem.limit_in_bytes will always show 0. And reading memory.kmem.slabinfo will give Input/output error
I think we should probably add an agent flag to control the enabling of kmem accounting feature, and only report stats if kmem accounting is enabled.
That sounds reasonable. I'll get to update the review with this.
Should I move the review to reviewboard or keep iterating here ?
@fcuny We prefer reviewboard for non trivial changes.
@fcuny were you going to continue this work on ReviewBoard?