kind icon indicating copy to clipboard operation
kind copied to clipboard

[WIP] Make /proc/sys read-only with carve-outs for some sysctls

Open dgl opened this issue 4 months ago • 4 comments

As mentioned on #3511 this could be a more complete way to ensure systemd or other components don't change sysctls unexpectedly. This also makes sysfs mountable per #3436 (but that is just the mount of sysfs on /kind/private/sys, so can easily be split, aside from any naming preferences).

WIP as I'm not sure it's the best option, but possibly better than fragile breakage due to unexpected sysctl changes.

The downside is it needs an allow list of sysctls which is probably going to need additions for other use cases, but it does mean kind can be explicit about what is supported.

The workaround to add a sysctl as writable would be:

docker exec a-node mount --rbind /kind/private/proc/sys/some-sysctl /proc/sys/some-sysctl

(This currently won't support running in some userns configurations yet, but it should be a case of just ignoring the error from mount if it errors (it can work, it depends on the exact userns environment). In a user namespace the host's sysctls can't be modified anyway. I can test userns cases if this option is worth taking further.)

dgl avatar Feb 13 '24 03:02 dgl

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dgl Once this PR has been reviewed and has the lgtm label, please assign aojea for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Feb 13 '24 03:02 k8s-ci-robot

Hi @dgl. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 13 '24 03:02 k8s-ci-robot

ensure systemd or other components don't change sysctls unexpectedly

Rootless mode ( https://kind.sigs.k8s.io/docs/user/rootless/ ) almost solves this issue.

AkihiroSuda avatar Feb 13 '24 13:02 AkihiroSuda

As mentioned on https://github.com/kubernetes-sigs/kind/pull/3511 this could be a more complete way to ensure systemd or other components don't change sysctls unexpectedly. This also makes sysfs mountable per https://github.com/kubernetes-sigs/kind/issues/3436 (but that is just the mount of sysfs on /kind/private/sys, so can easily be split, aside from any naming preferences).

I'm really hesitant to ship a change like this because it's hard to say how we'll break users that have come to rely on this over the years and disabling something like udev/binfmt misc on the other hand is cheap and reasonable, at the risk of missing some future systemd behavior.

BenTheElder avatar Feb 13 '24 18:02 BenTheElder