kops icon indicating copy to clipboard operation
kops copied to clipboard

Remove explicit sysctl fs.inotify.max_user_watches setting

Open ajoga opened this issue 4 months ago • 21 comments

Since Linux 5.11-rc1, fs.inotify.max_user_watches is dynamically computed up to 1048576 with regards to the addressable physical memory: https://github.com/torvalds/linux/commit/92890123749bafc317bbfacbe0a62ce08d78efb7 .

I suggest removing the current explicit setting to a lower maximum value in favor of using the kernel's default smart way that can provide memory gains on smaller nodes which wouldn't require a high value there.

Tablecloth math from the above-linked commit makes me understand that on a 64bits host with 64GB fs.inotify.max_user_watches would be set to the currently hard coded 524288.

ajoga avatar Aug 13 '25 07:08 ajoga

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: ajoga / name: Aurélien Joga (e6aa1e746fda0d05fc8c9db8bfd73af8f7ffd19b)

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign olemarkus for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Aug 13 '25 07:08 k8s-ci-robot

Welcome @ajoga!

It looks like this is your first PR to kubernetes/kops 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kops has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar Aug 13 '25 07:08 k8s-ci-robot

Hi @ajoga. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Aug 13 '25 07:08 k8s-ci-robot

I'll sign the CLA later I do not have access to the device needed right away

ajoga avatar Aug 13 '25 07:08 ajoga

/ok-to-test

hakman avatar Aug 13 '25 17:08 hakman

I understand there's a change to be done in upstream fnotify, not right here. I've made a PR there (https://github.com/fsnotify/fsnotify/pull/708), I'll see the outcome and update this PR accordingly

ajoga avatar Aug 14 '25 09:08 ajoga

well in fact no i can just do both at the same time, I took out the changes I committed here that are part of fnotify, let's see

ajoga avatar Aug 14 '25 09:08 ajoga

/retest

ajoga avatar Aug 14 '25 09:08 ajoga

/test pull-kops-aws-distro-al2023 /test pull-kops-aws-distro-rhel9

ameukam avatar Aug 14 '25 11:08 ameukam

/test pull-kops-aws-distro-rhel9

ajoga avatar Aug 15 '25 07:08 ajoga

/test pull-kops-aws-distro-rhel9

ameukam avatar Aug 15 '25 09:08 ameukam

/restest I think we can skip the rhel9 failing test.

ameukam avatar Aug 16 '25 09:08 ameukam

kOps has support for distros with pretty old kernels. Any idea how far back this change was back-ported? For example, is it part of RHEL 8, 9 and AmazonLinux 2, 2023? I don't thin I am that worried about Ubuntu, Debian, Flatcar, COS.

hakman avatar Aug 16 '25 11:08 hakman

kOps has support for distros with pretty old kernels. Any idea how far back this change was back-ported? For example, is it part of RHEL 8, 9 and AmazonLinux 2, 2023? I don't thin I am that worried about Ubuntu, Debian, Flatcar, COS.

Mh, this isn't a concern I anticipated, good point.

I'm not sure where too look at for reliable information for RHEL, I do not have access to their subscription-walled resources ; however I could find that it seems the change was backported in the kernels for Centos8 and 9 prior to their depreciation:

  • https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-8/-/blob/c8s/fs/notify/inotify/inotify_user.c?ref_type=heads#L819

  • https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/blob/main/fs/notify/inotify/inotify_user.c?ref_type=heads#L839

  • Amazon Linux 2 version 2.0.20211223.0, 2.0.20250201 -> kernel-4.14 -> NO

  • Amazon Linux 2 version 2.0.20250808 -> kernel-5.10.240-238.959 -> yes

  • Amazon Linux 1 is EOL so didn't check https://docs.aws.amazon.com/linux/al2/ug/compare-with-al1.html

I don't thin I am that worried about Ubuntu, Debian, Flatcar, COS.

Do you want me to look into this too or are you saying we don't care?

I'd hate to be the source of a backward breakage, and it may be sensible to not do this change at this time, so feel free to close the PR if you see it that way too.

ajoga avatar Aug 18 '25 07:08 ajoga

FWIW, you may be able to the kernel versions for RHEL releases: https://access.redhat.com/articles/3078.

ameukam avatar Aug 18 '25 08:08 ameukam

Do you want me to look into this too or are you saying we don't care?

I'd hate to be the source of a backward breakage, and it may be sensible to not do this change at this time, so feel free to close the PR if you see it that way too.

@ajoga I think the change is good, we just need to add an exception for the older distros. Should be pretty easy, but would require a little research. Would that be ok for you?

hakman avatar Aug 25 '25 07:08 hakman

Do you want me to look into this too or are you saying we don't care? I'd hate to be the source of a backward breakage, and it may be sensible to not do this change at this time, so feel free to close the PR if you see it that way too.

@ajoga I think the change is good, we just need to add an exception for the older distros. Should be pretty easy, but would require a little research. Would that be ok for you?

I have zero golang-skills and no time to dig this, so I have to decline I'm sorry

(FYI changes in the documentation strings at https://github.com/fsnotify/fsnotify/pull/708 were adjusted by a maintainer & merged)

ajoga avatar Sep 01 '25 14:09 ajoga

Thanks for all the effort @ajoga, I will take it from here.

hakman avatar Sep 02 '25 06:09 hakman

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 01 '25 06:12 k8s-triage-robot

/remove-lifecycle stale /assign

hakman avatar Dec 01 '25 08:12 hakman