runc icon indicating copy to clipboard operation
runc copied to clipboard

cgroup v1/v2 compatibility issue when setting memory below the current usage

Open kolyshkin opened this issue 3 years ago • 8 comments

With cgroup v1, when we set the memory limit to below the current usage (runc update on a running container), the kernel returns EBUSY and runc fails with a nice error message:

ERRO[0000] unable to set memory limit to 27033 (current usage: 270336, peak usage: 6082560) 

With cgroup v2, when do do this, kernel OOM killer just kill the container. This makes this behavior incompatible with cgroup v1.

One (imperfect) workaround is to add a flag to OCI spec that disallows to set memory limit to the value lower than the current usage. This is borderline ugly but at least in most cases we'll return an error instead of letting the container being OOM killed.

(the other, much less serious part of the problem is, when container is disappearing in the middle of runc update, we get all sorts of ugly messages)

kolyshkin avatar Jun 14 '22 22:06 kolyshkin

could we use memory.high instead of memory.max?

giuseppe avatar Jun 15 '22 07:06 giuseppe

I don't have a complete understanding at this point but are we talking about cgroup memory limit applied at the time of container creation? And if that's the case, is the difference then the fact that in cgroupv2 the kernel isn't returning an EBUSY anymore?

add a flag to OCI spec

And then have runc parse it and fail early instead of the container being OOMKilled?

danishprakash avatar Jun 15 '22 14:06 danishprakash

This is when we try to update the memory limit of an already running container to a value that is less than what it is currently using. In v1, we got EBUSY, but in v2, kernel applies the value and if it is low, the container is OOM Killed.

mrunalp avatar Jun 15 '22 21:06 mrunalp

could we use memory.high instead of memory.max?

From the vertical pod autoscaler POV -- yes. Meaning, it will still have to distinguish between v1 and v2. Meaning, it does not make sense to add a flag I have proposed in the description.

kolyshkin avatar Jun 16 '22 00:06 kolyshkin

could we use memory.high instead of memory.max

I think that will have to be phase 2 with cgroups v2 in k8s. Phase 1 is just a direct mapping to v1.

mrunalp avatar Jun 17 '22 18:06 mrunalp

Is it possible to get the current memory usage from memory.current and if it is lower than that, not update it and return an error? This may be too much help as OCI runtime...?

utam0k avatar Aug 25 '22 01:08 utam0k

Is there a similar problem with other configurations other than memory?

kamizjw avatar Sep 06 '22 03:09 kamizjw

Is there a similar problem with other configurations other than memory?

Not that I know of.

kolyshkin avatar Sep 09 '22 23:09 kolyshkin