runc
runc copied to clipboard
Recursively thaw freezer subcgroups during SIGKILL
Fixes #2105
This PR is offered as an example of what I think would work. Unfortunately, it's kinda breaking some abstractions, and someone who knows the runC codebase could probably reorganize the code to better accommodate the additional code.
So, this change could be refactored to not add ThawAll() as a method everywhere, and instead use the normal Set mechanism with a new configs.ThawedRecursive enum.
I think this would be less invasive and still fix the bug? Feedback on what would be preferred is welcome.
@mrunalp any feedback? Happy to refactor this but want to know your thoughts first. Thanks!
@hqhq any thoughts?
I don't think runc is quite robust for nested containers, but for this fix, the idea looks good to me, child container should not block the exit of father container.
Thanks for the feedback @hqhq! As @percontation mentioned above, instead of adding ThawAll() as a method everywhere, we could use the normal Set mechanism with a new configs.ThawedRecursive enum. Or did you have something else in mind? Just want to figure out how this can be improved so it can be merged in.
@percontation @mruck Current change looks good to me, just needs rebase, thanks.
@hqhq rebase done, let me know if there's anything else!
~It looks like the new cgroups objects have had the corresponding changes made for them as well here. (Not totally sure since I'm unfamiliar with the new cgroups stuff.) @mruck can you confirm?~ err wait maybe not quite.
Updated so the change applies to cgroupsv2
@hqhq let me know if anything else needs to be updated.
Thanks! @mrunalp @crosbymichael any comments?
@mrunalp @crosbymichael anything we can do to get this merged in?
@mrunalp @crosbymichael hoping to get this merged in since we use this in production and @circosta looks like he ran into something similar
My worry with this change is that people can setup sub-cgroups that maybe related to the container but not bound to its lifecycle and this would be a breaking change if we recursed down the tree and changed settings.
What usecase are you solving and how are you getting yourself into the position that is causing the error?
My worry with this change is that people can setup sub-cgroups that maybe related to the container but not bound to its lifecycle and this would be a breaking change if we recursed down the tree and changed settings.
Yes, the recursive calls only apply in the SIGKILL case of container shutdown. So, this patch doesn't cause any recursive "change of settings" of any process that runC wasn't about to send a SIGKILL at anyways.
What usecase are you solving and how are you getting yourself into the position that is causing the error?
We have an application that uses runC containers. Attempting to run this application on kubernetes-based infrastructure means that we're running nested runC. As described in #2105 running nested runC can result in the outer runC deadlocking as it attempts to exit (and this extends to cases beyond just nested runC; it applies to any runC container containing an application that wants to make use of freezer cgroups).
@percontation so this would be in the case of killing the pod container while another container ( inside the pod) is still frozen?
Yes. (Although, in our case, we're not intentionally suspending containers. The temporary freezing and unfreezing that runC does while sending signals is enough to cause the problem to exhibit from time to time.)
@crosbymichael is there anything else we can do to help so this gets merged?
@mrunalp @cyphar can you help review this one as well?
@mrunalp @cyphar can we get your feedback? really hoping to get this upstreamed since we are running into this in production on k8s
@crosbymichael any chance you can help me get the attention of @mrunalp @cyphar?
@mruck can you give this one more shot by rebasing?
Rebases have not been entirely trivial since underlying cgroupsv2 code keeps changing, so it's probably not worth the effort until there's enough interest in reviewing the bugfix anyways ¯\(ツ)/¯
On the review side at least, there's one more new reviewer: @AkihiroSuda. He might be willing to give this a shot (in addition to the LGTM from @hqhq; will have to be reconfirmed).
Ah, cool. Yeah, if things actually start moving here I'm happy to deal with the conflicts.
I think we can take this in after 1.0 release. Nothing bad in thawing a container which we're going to kill anyway :)
@percontation PTAL ^^^ (and this needs a rebase)