node-feature-discovery icon indicating copy to clipboard operation
node-feature-discovery copied to clipboard

Disable hook support by default

Open eero-t opened this issue 3 years ago • 3 comments

What would you like to be added:

Disable support for hooks by default, and make minimal image the default. Add NFD worker option to enable support for hooks, for backwards compatibility. Deprecate and eventually remove support for dynamic binary + script hooks (to prepare for potential removal of also static binaries hook support).

Feature files can support all the same things while avoiding the issues plaguing hooks (listed below).

Why is this needed:

There are multiple issues with hook support: https://kubernetes-sigs.github.io/node-feature-discovery/v0.11/advanced/customization-guide.html#hooks

Bash, Perl and dynamic binaries support:

  • Security: larger image + attack surface
  • Maintenance: need to update these images to handle security issues in them / their dependencies (in addition to Go ones relevant for NFD itself), see e.g: #853
  • Maintenance: unintentional(?) support also for dynamic executables which happen to link to same system libraries, that NFD users could start relying on
  • Documentation: missing on what exactly is supported in given NFD version, which Bash and Perl version, which Perl modules, and which dependency libraries
    • In theory this could be auto-generated for each release image, and diff generated against previous release

Hook support in general:

  • Security: impact of running "random" scripts / binaries in same context as NFD
  • Functionality: Whomever provides the hook, not being able to impact what is exposed to it
  • Documentation: missing on what is exposed for the hook from the system and network
  • Reliability: not being able to tighten NFD worker resource limits as resource usage of hooks is unknown
  • Reliability: additional validation needed for verifying that NFD handles + logs misbehaving hooks correctly (e.g. ones generating output forever or using too much of some other resources)
  • Conformance: binaries in /etc/ being FHS spec violation: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s07.html

External feature life-cycle management

I think feature life-cycle management would also be easier if support for hooks were disabled/dropped.

Life-cycle issues:

  • Documentation: what party is supposed to clean out the obsolete features and how?
  • Functionality: what happens if e.g. Pod adds differently named hook binary or feature file on each run, or some test Pods run year ago were obsolete the next day, but their features still persist in /etc/? NFD continues running them and slowing down with time?
  • Documentation: what happens if multiple hooks or feature files update same labels, does NFD complain loudly, or just take the last value it gets?

Potential solutions:

  • Feature files could include best-before timestamp to solve that, after which NFD could ignore or remove them. With source hooks, another file and some mapping between them would be needed for that
  • NFD could trivially reject + log (maybe even remove) "obviously too large" feature files. For binaries one cannot easily say what reasonable binary size is too large (hopefully 1GB would be obviously too large)
  • Admin could also put features directory to tmpfs to make sure obsolete stuff gets cleaned out on next reboot, but with (static) binaries in there, that could take "too much" RAM
  • Within same file / hook, last of multiple value declarations for same label could be taken (with warnings on extra ones), but IMHO between different files / hooks, duplicate labels with different values should be a clearly user visible error (with same values giving just warning)

eero-t avatar Aug 12 '22 15:08 eero-t

Btw. was nfd-worker local socket ever considered for label updates instead of hooks?

eero-t avatar Aug 12 '22 15:08 eero-t

Regarding the hooks, thanks @eero-t for starting this discussion. I basically agree with this and have been anxious about the hook support myself, especially after we introduced the minimal image. Hooks are a hack'ish burden that I'm not entirely sure that how many people are even using.

One sketch of a plan to remove them completely over multiple releases:

  1. n (v0.12): declare hooks as deprecated, introduce config option to disable them but have them still enabled by defailt
  2. n+1: disable hooks by default but have still the option to enable them
  3. n+2: drop hooks support entirely, make minimal image the default image, drop the big image or e.g. rename it to -debug

The life-cycle management of "external features" is another class of issues that is worth tracking separately. I think we can concentrate on feature files there as hooks are likely to go away 😄 @eero-t may I ask you to create a separate issue about this subject?

marquiz avatar Aug 15 '22 08:08 marquiz

Added separate tickets for the life-time management improvements, and for adding socket API for external features.

eero-t avatar Aug 15 '22 13:08 eero-t

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 14 '22 08:11 k8s-triage-robot

/remove-lifecycle stale

marquiz avatar Nov 14 '22 10:11 marquiz

Moved to v0.14.0

marquiz avatar Dec 23 '22 10:12 marquiz

Maybe for v0.13 we could actually change the base image of the default image distroless/base (making it identical to the -minimal image). We could add a -full image variant to match the current default image (with debian:bullseye-slim as the base). We could also make limit the minimal image even more, base it on scratch but I don't know if that's something of interest(?)

marquiz avatar Jan 09 '23 14:01 marquiz

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 09 '23 14:04 k8s-triage-robot

/remove-lifecycle stale

marquiz avatar Apr 11 '23 07:04 marquiz

Fixed by #1182 /close

marquiz avatar Jun 22 '23 12:06 marquiz

@marquiz: Closing this issue.

In response to this:

Fixed by #1182 /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jun 22 '23 12:06 k8s-ci-robot