runc Recursively thaw freezer subcgroups during SIGKILL

Fixes #2105

This PR is offered as an example of what I think would work. Unfortunately, it's kinda breaking some abstractions, and someone who knows the runC codebase could probably reorganize the code to better accommodate the additional code.

Aug 15 '19 07:08 percontation

So, this change could be refactored to not add ThawAll() as a method everywhere, and instead use the normal Set mechanism with a new configs.ThawedRecursive enum.

I think this would be less invasive and still fix the bug? Feedback on what would be preferred is welcome.

Aug 22 '19 22:08 percontation

@mrunalp any feedback? Happy to refactor this but want to know your thoughts first. Thanks!

Aug 29 '19 23:08 mruck

@hqhq any thoughts?

Sep 06 '19 22:09 mruck

I don't think runc is quite robust for nested containers, but for this fix, the idea looks good to me, child container should not block the exit of father container.

Sep 07 '19 15:09 hqhq

Thanks for the feedback @hqhq! As @percontation mentioned above, instead of adding ThawAll() as a method everywhere, we could use the normal Set mechanism with a new configs.ThawedRecursive enum. Or did you have something else in mind? Just want to figure out how this can be improved so it can be merged in.

Sep 09 '19 19:09 mruck

@percontation @mruck Current change looks good to me, just needs rebase, thanks.

Sep 10 '19 11:09 hqhq

@hqhq rebase done, let me know if there's anything else!

Sep 10 '19 22:09 mruck

LGTM

@mrunalp @crosbymichael Is it OK that we only add functionalities on legacy cgroup now?

Sep 11 '19 01:09 hqhq

~It looks like the new cgroups objects have had the corresponding changes made for them as well here. (Not totally sure since I'm unfamiliar with the new cgroups stuff.) @mruck can you confirm?~ err wait maybe not quite.

Sep 11 '19 01:09 percontation

Updated so the change applies to cgroupsv2

Sep 11 '19 23:09 mruck

@hqhq let me know if anything else needs to be updated.

Sep 13 '19 20:09 mruck

LGTM

Sep 15 '19 05:09 hqhq

Thanks! @mrunalp @crosbymichael any comments?

Sep 15 '19 22:09 mruck

@mrunalp @crosbymichael anything we can do to get this merged in?

Sep 23 '19 22:09 mruck

@mrunalp @crosbymichael hoping to get this merged in since we use this in production and @circosta looks like he ran into something similar

Sep 30 '19 20:09 mruck

My worry with this change is that people can setup sub-cgroups that maybe related to the container but not bound to its lifecycle and this would be a breaking change if we recursed down the tree and changed settings.

What usecase are you solving and how are you getting yourself into the position that is causing the error?

Oct 01 '19 15:10 crosbymichael

My worry with this change is that people can setup sub-cgroups that maybe related to the container but not bound to its lifecycle and this would be a breaking change if we recursed down the tree and changed settings.

Yes, the recursive calls only apply in the SIGKILL case of container shutdown. So, this patch doesn't cause any recursive "change of settings" of any process that runC wasn't about to send a SIGKILL at anyways.

What usecase are you solving and how are you getting yourself into the position that is causing the error?

We have an application that uses runC containers. Attempting to run this application on kubernetes-based infrastructure means that we're running nested runC. As described in #2105 running nested runC can result in the outer runC deadlocking as it attempts to exit (and this extends to cases beyond just nested runC; it applies to any runC container containing an application that wants to make use of freezer cgroups).

Oct 01 '19 20:10 percontation

@percontation so this would be in the case of killing the pod container while another container ( inside the pod) is still frozen?

Oct 01 '19 20:10 crosbymichael

Yes. (Although, in our case, we're not intentionally suspending containers. The temporary freezing and unfreezing that runC does while sending signals is enough to cause the problem to exhibit from time to time.)

Oct 01 '19 20:10 percontation

@crosbymichael is there anything else we can do to help so this gets merged?

Oct 14 '19 20:10 mruck

@mrunalp @cyphar can you help review this one as well?

Oct 23 '19 15:10 crosbymichael

@mrunalp @cyphar can we get your feedback? really hoping to get this upstreamed since we are running into this in production on k8s

Nov 05 '19 20:11 mruck

@crosbymichael any chance you can help me get the attention of @mrunalp @cyphar?

Nov 18 '19 20:11 mruck

@mruck can you give this one more shot by rebasing?

Mar 10 '20 07:03 h-vetinari

Rebases have not been entirely trivial since underlying cgroupsv2 code keeps changing, so it's probably not worth the effort until there's enough interest in reviewing the bugfix anyways ¯\(ツ)/¯

Mar 11 '20 00:03 percontation

On the review side at least, there's one more new reviewer: @AkihiroSuda. He might be willing to give this a shot (in addition to the LGTM from @hqhq; will have to be reconfirmed).

Mar 11 '20 00:03 h-vetinari

Ah, cool. Yeah, if things actually start moving here I'm happy to deal with the conflicts.

Mar 11 '20 00:03 percontation

I think we can take this in after 1.0 release. Nothing bad in thawing a container which we're going to kill anyway :)

Feb 04 '21 23:02 kolyshkin

@percontation PTAL ^^^ (and this needs a rebase)

Apr 08 '21 17:04 kolyshkin

runc runc copied to clipboard

Recursively thaw freezer subcgroups during SIGKILL

runc
runc copied to clipboard