nri icon indicating copy to clipboard operation
nri copied to clipboard

Configurable Restrictions of Selected NRI Features

Open klihub opened this issue 9 months ago • 36 comments
trafficstars

Background

An increasing number of PRs or feature requests from the community propose adding new container mutation capabilities with direct security implications to NRI. See #123, #124, and #135 for some examples.

Based on the review discussions of these PRs it looks like

  1. there is a desire to extend NRI with such capabilities
  2. the ability to administratively lock down such capabilities would help accepting them
  3. some existing capabilities in NRI could be put under such control (OCI hook injection)

Proposed Enhancement

Extend the current implementation of NRI with the notion of restrictions.

Restrictions allow one to disable a subset of the container mutation capabilities present in NRI. Restrictions are defined in the configuration. Initially a single global set of restrictions are defined and they apply to all plugins. This can later be updated to a more fine-grained model with further improvements, if desired.

Restrictions are communicated to an NRI plugin during registration. The plugin can then report and choose not to start up if the configured restrictions prevented it from functioning properly. Initially OCI hook injection would be subject to restrictions. For backward compatibility with the current implementation, OCI hooks would default to an unrestricted configuration.

NRI request/response processing is updated to check a plugin's response for potential restriction violations before applying the requested mutations to containers. If restrictions are violated, the plugin is forcibly disconnected with an error message.

Proposed Implementation

Protocol Additions

Define the set of possible restrictions. Initially this only consist of OCI hook injection.

message Restrictions {
    // If set to true, adjustment of OCI hooks is disabled.
    bool oci_hooks = 1;
}

Include configured restrictions also in the plugin configuration request.

message ConfigureRequest {
  // Any plugin-specific data, if present among the NRI configuration.
  string config = 1;
  // Name of the runtime NRI is running in.
  string runtime_name = 2;
  // Version of the runtime NRI is running in.
  string runtime_version = 3;
  // Configured registration timeout in milliseconds.
  int64 registration_timeout = 4;
  // Configured request processing timeout in milliseconds.
  int64 request_timeout = 5;
  // Extra restrictions imposed by the runtime.
  Restrictions restrictions = 6;
}

Add convenience functions for runtime restriction checking.

func (r *Restrictions) CheckAdjustment(a *ContainerAdjustment) error {
	if a == nil || r == nil {
		return nil
	}
	if err := r.checkOciHookAdjustment(a); err != nil {
		return err
	}
	return nil
}

func (r *Restrictions) checkOciHookAdjustment(a *ContainerAdjustment) error {
	if !r.OciHooks || a.Hooks == nil {
		return nil
	}

	switch {
	case a.Hooks.Prestart != nil:
		return OciHooksRestricted
	case a.Hooks.CreateRuntime != nil:
		return OciHooksRestricted
	case a.Hooks.CreateContainer != nil:
		return OciHooksRestricted
	case a.Hooks.StartContainer != nil:
		return OciHooksRestricted
	case a.Hooks.Poststart != nil:
		return OciHooksRestricted
	case a.Hooks.Poststop != nil:
		return OciHooksRestricted
	}

	return nil
}

Add convenience functions for plugin side permission checks.

func (r *Restrictions) AllowOciHooks() bool {
	return r == nil || !r.OciHooks
}

Runtime Adaptation Changes

Add a new programmatic option for the NRI runtime adaptation interface to set the configured restrictions. Update the plugin response processing logic to first check a response for any potential violation of configured restrictions. If a violation is found, log an error and disconnect the plugin.

Plugin Stub Changes

Extract restrictions from the configuration message and make it available to the plugin.

Sample Plugin Changes

In the OCI hook injector plugin, check restrictions during plugin configuration logging an error and exiting if OCI hook injection is disallowed.

Changes for Pending PRs

  • For PR #123

Add restrictions for seccomp policy adjustment.

message Restrictions {
    // If set to true, adjustment of OCI hooks is disabled.
    bool oci_hooks = 1;
    // If set to true, adjustment of seccpomp policy is disabled.
    bool seccomp_policy = 2;
}

Implement/update restriction checking functions for seccomp policy.

var (
	RestrictionError        = fmt.Errorf("restriction violation")
	OciHooksRestricted      = fmt.Errorf("%w: OCI hook adjustment disabled", RestrictionError)
        SeccompPolicyRestricted = fmt.Errorf("%w: seccomp policy adjustment disabled, RestrictionError)
)

func (r *Restrictions) CheckAdjustment(a *ContainerAdjustment) error {
	if a == nil || r == nil {
		return nil
	}
	if err := r.checkOciHookAdjustment(a); err != nil {
		return err
	}
	if err := r.checkSeccompPolicyAdjustment(a); err != nil {
		return err
	}
	return nil
}

func (r *Restrictions) checkSeccompPolicyAdjustment(a *ContainerAdjustment) error {
	if !r.SeccompPolicy || a.Linux == nil || a.Linux.SeccompPolicy == nil {
		return nil
	}

        return SeccompPolicyRestricted
}

Add convenience functions for plugin side permissions checks.

func (r *Restrictions) AllowSeccompPolicy() bool {
	return r == nil || !r.SeccompPolicy
}
  • For PR #124 (or #135)

Add restrictions for linux namespace adjustment.

message Restrictions {
    // If set to true, adjustment of OCI hooks is disabled.
    bool oci_hooks = 1;
    // If set to true, adjustment of seccpomp policy is disabled.
    bool seccomp_policy = 2;
    // If set to true, adjustment of linux namespaces is disabled.
    bool namespaces = 3;
}

Implement/update violation checking for namespaces.

var (
	RestrictionError        = fmt.Errorf("restriction violation")
	OciHooksRestricted      = fmt.Errorf("%w: OCI hook adjustment disabled", RestrictionError)
        SeccompPolicyRestricted = fmt.Errorf("%w: seccomp policy adjustment disabled, RestrictionError)
        NamespacesRestricted    = fmt.Errorf("%w: namespace adjustment disabled, RestrictionError)
)

func (r *Restrictions) CheckAdjustment(a *ContainerAdjustment) error {
	if a == nil || r == nil {
		return nil
	}
	if err := r.checkOciHookAdjustment(a); err != nil {
		return err
	}
	if err := r.checkSeccompPolicyAdjustment(a); err != nil {
		return err
	}
	if err := r.checkNamespaceAdjustment(a); err != nil {
		return err
	}
	return nil
}

func (r *Restrictions) checkNamespaceAdjustment(a *ContainerAdjustment) error {
	if !r.Namespaces || a.Linux == nil || len(a.Linux.Namespaces) == 0 {
		return nil
	}

        return NamespacesRestricted
}

Add convenience functions for plugins to check permissions.

func (r *Restrictions) AllowNamespaces() bool {
	return r == nil || !r.Namespaces
}

Potential Future Improvements

If we implement a mechanism to securely authenticate and identify plugins, this scheme can be updated to allow additional, more fine-grained per plugin (or per authorization level) restrictions.

TODO: Provide a reasonably detailed description of how that would work.

klihub avatar Jan 28 '25 07:01 klihub

/cc @samuelkarp @kad

klihub avatar Jan 28 '25 07:01 klihub

/cc @haircommander

klihub avatar Jan 30 '25 15:01 klihub

I almost am thinking of restrictions as the opposite: I think for any hook an admin installs, they'd want a subset of containers to be modified but some of them not to be, instead of having a policy on which aspects of hooks are allowed to run. as in: an admin probably expects every hook installed works as expected, but may want only some containers to be affected. I'm not sure of a platform agnostic way to do this. In cri-o, we have the "allowed_annotations" notion that is enabled on runtime class level, but i'm not sure that's the right level here

haircommander avatar Jan 30 '25 18:01 haircommander

@haircommander I think we probably want both: allowlist/denylist for modifiable attributes (so an admin can categorically say "NRI plugins cannot inject hooks" for example) and allowlist/denylist for pods that can be modified.

samuelkarp avatar Jan 30 '25 18:01 samuelkarp

Right now with mount or devices adjustments you can escape to the host, I understand people wanting something more fine grained than NRI on/off, but right now adding seccomp or namespace adjustment doesn't change how unsecure NRI is, so do we really need to block all PRs until we figure out fine grained permissions ?

champtar avatar Jan 30 '25 20:01 champtar

Right now with mount or devices adjustments you can escape to the host

Yes, also with hook adjustments.

samuelkarp avatar Jan 30 '25 21:01 samuelkarp

Right now with mount or devices adjustments you can escape to the host, I understand people wanting something more fine grained than NRI on/off, but right now adding seccomp or namespace adjustment doesn't change how unsecure NRI is, so do we really need to block all PRs until we figure out fine grained permissions ?

@champtar Shouldn't we anyway go for the eventual granular controls which everybody considers fine-grained enough in multiple smaller steps ? We could first add the basic capability to lock down selected features globally, starting with OCI hooks. Then we could rebase and update #123 and #124/#135 to also add lockdown for the new control they introduce. This shouldn't be too difficult, and it could make it easier for some to accept the seccomp- and namespace-mutating changes.

klihub avatar Jan 31 '25 07:01 klihub

as in: an admin probably expects every hook installed works as expected, but may want only some containers to be affected. I'm not sure of a platform agnostic way to do this. In cri-o, we have the "allowed_annotations" notion that is enabled on runtime class level, but i'm not sure that's the right level here

@haircommander One way could be something similar to what is suggested above for implementing the feature-based restrictions: implement the (container NRI processing exclusion) logic in NRI itself, add an option for configuring it, and take the necessary configuration data for it from the runtime configuration.

One thing I find difficult to tell here is how flexible/versatile we'd need to be in recognizing when a container should be excluded from NRI processing. For instance, would it be enough to allow exclusion based on namespace and/or the absence/presence of some annotation ?

Also, it's worth noting that for some classes of plugins it might be problematic if they are completely unaware of the existence of some containers... for instance, if a plugin needs to know about all resources assigned to containers.

klihub avatar Jan 31 '25 09:01 klihub

for CRI-O's purposes, kubernetes namespaces would be sufficient. runtime class could also work. we just need some way it's exposed in the kubernetes API so we can connect policy decisions.

haircommander avatar Feb 04 '25 17:02 haircommander

for CRI-O's purposes, kubernetes namespaces would be sufficient. runtime class could also work. we just need some way it's exposed in the kubernetes API so we can connect policy decisions.

@haircommander Any comments/thoughts regarding the potentially problematic aspects of hiding some pods/containers completely from plugins ?

A few possibilities come to mind, but none stands out as an obvious winner to me. Ones that crossed my mind are

  • don't solve it, just document it
  • don't hide containers completely, just hide the (create, update) events where a plugin could alter the container
  • don't hide events, but pass additional info to indicate that container is hands-off for plugin alterations

These all come with their own set of problems.

klihub avatar Feb 06 '25 16:02 klihub

Right now with mount or devices adjustments you can escape to the host, I understand people wanting something more fine grained than NRI on/off, but right now adding seccomp or namespace adjustment doesn't change how unsecure NRI is, so do we really need to block all PRs until we figure out fine grained permissions ?

@champtar Shouldn't we anyway go for the eventual granular controls which everybody considers fine-grained enough in multiple smaller steps ? We could first add the basic capability to lock down selected features globally, starting with OCI hooks. Then we could rebase and update #123 and #124/#135 to also add lockdown for the new control they introduce. This shouldn't be too difficult, and it could make it easier for some to accept the seccomp- and namespace-mutating changes.

@klihub I fail to understand why some people push back on namespace / seccomp when everything is already fully open. Securing NRI can be done in // instead of delaying everything by multiple months, but I don't think I'm going to convince you :)

champtar avatar Feb 06 '25 18:02 champtar

@klihub I fail to understand why some people push back on namespace / seccomp when everything is already fully open. Securing NRI can be done in // instead of delaying everything by multiple months, but I don't think I'm going to convince you :)

@champtar I feel your frustration and I am sorry about it. But I think it's not me you'd need to convince. I'd be ready to go forward both with the pending PRs and this. I just don't know what I should do next to help things progress. I promised to write a design proposal (which this issue tries to be) instead of filing a PR, so that we can can discuss it, suggest better alternatives, additions or other improvements, and/or give an A-OK for me to progress with this.

Some additions have been suggested by @haircommander, and I know @kad has one cooking for per-plugin restrictions, but what comes to the acceptance of this as a whole, there's no verdict one way or the other whether this is good to go forward with. I would have a first implementation and additional commits for controlling seccomp and namespace adjustment (which ideally would come in each as part of the corresponding pending PR). I could file a PR from that for review, but I would feel much more confident doing so if I knew that folks are fine with this both as an approach and as a starting point.

klihub avatar Feb 07 '25 06:02 klihub

@champtar And sorry for my misleading previous 'in multiple steps' comment. What I was trying to say, is that if this is OK as a starting point and it is OK to go for more granular controls in multiple steps, then I think it should be possible to progress with this and the pending PRs pretty fast. But then I failed to say it.

klihub avatar Feb 07 '25 06:02 klihub

@champter the issue that brought config up was adding the ability to modify the kubernetes pod security context, with an API. The expectation being that the security context should be a minimum level of security for the running pod. Not adhering to the pod security context is a violation of the CRI contract. These plugins extend and enhance the container runtime but are not to unsecure the pod security context.

That's the argument.

Yes people have proxied cri container runtimes and runc engines .. and certain oci hooks also have the ability to mutate and add mounts and yes certain mounts can effect the node and attached in bad ways.

As @klihub says don't solve it, just document it is an option. My preference is a per runtime class/handler config (allow/deny) for now and consider a KEP to also allow patterns of use with k8s namespaces.

mikebrow avatar Feb 19 '25 20:02 mikebrow

@klihub I think on the allow deny it should be also subject to whether or not the container runtime is being asked to use the default container runtime config or not.

restrictions...
// If set to true, adjustment of seccomp policy is disabled for all but the default container runtime seccomp policy.
bool seccomp_policy = 2;

allow list.... some int/string with an enum.. seccomp_policy string chosen from "none", "defaultCROnly", "all"

restrictions... seccomp_policy string chosen from "all", "nonDefaultCROnly", "none"

mikebrow avatar Feb 19 '25 20:02 mikebrow

These plugins extend and enhance the container runtime but are not to unsecure the pod security context.

You would not run random build of containerd ? just don't run random NRI plugins.

If you want to use NRI to implement a stricter seccomp policy, you need to trust your NRI plugin to not actually apply a weaker policy (or no policy).

I would be great to have users that want those additional restriction explain their use cases and threat model because right now we are going to add knobs but I'm still not sure we are going to increase the overall security of the node.

champtar avatar Feb 19 '25 21:02 champtar

These plugins extend and enhance the container runtime but are not to unsecure the pod security context.

You would not run random build of containerd ? just don't run random NRI plugins.

If you want to use NRI to implement a stricter seccomp policy, you need to trust your NRI plugin to not actually apply a weaker policy (or no policy).

I would be great to have users that want those additional restriction explain their use cases and threat model because right now we are going to add knobs but I'm still not sure we are going to increase the overall security of the node.

Yes, one option is to require admins/cloud vendors to choose which plugins they have trust in.. this is what is meant by "As @klihub says don't solve it, just document it is an option." Or we can add validators .. trust and verify, and we can add config to show the admins a way to secure from plugin changes to pods that have selected a security context.

That said modifying the "default" container runtime seccomp profile is not the same as modifying one set by the pod spec. Huge difference. For example, it would be nice to have a plugin switch between the crio default or the containerd default.

In an on premise cloud I can easily see the admin installing their seccomp plugin and trusting it the same as they do containerd along with it's security stance / kubernetes/containerd config etc.

In a public cloud I can see some customer pods having one default security context and another set of pods from another customer having a different default security context as set in the pod spec or via web hook. Further, I can see using the runtime class to configure / pick and choose whether override is appropriate and/or which defaults to use for the container runtime default.

IOW this is not as easy as ... ok containerd cri .. get out the way we know what we're doing down here in the plugin.

mikebrow avatar Feb 20 '25 17:02 mikebrow

@klihub My apologies for the delay here.

Restrictions are communicated to an NRI plugin during registration. The plugin can then report and choose not to start up if the configured restrictions prevented it from functioning properly.

Is there a ton of value in communicating the restrictions to the plugins? I suppose a plugin can modify its behavior or (as you said) choose to not start up, but then that seems that the behavior requires understanding more about how the plugin works. Some alternatives could be:

  • reject all adjustments made by a plugin to a container if any of the adjustments are disallowed
  • reject the container from starting if a plugin attempts to make disallowed adjustments

Some advantage of those would be that the behavior is simple: either the plugin is non-functional or the entire container startup fails. But care would need to be taken in making sure the errors are exposed to the user properly.

Taking a step further back, I'm wondering if it's worth taking a cue from admission controllers. The two-phase approach of mutation and validation could be used here to allow an administrator to enforce a chosen policy, and (like Kubernetes) we could have a default policy that provides for some form of "recommended safe" behavior.

Trying to sketch this out a bit:

  • existing NRI plugins could be reclassified as mutating plugins and have no change to their behavior or interfaces
  • a new type of validating NRI plugin can be created
  • validating plugins could receive all the same inputs as mutating plugins and receive the list of adjustments (possibly annotated with the mutating plugin that proposed those adjustments)
  • the validating plugin can then allow or deny the container from running
  • we could build very simple policy directly into the NRI framework that default disallows certain classes of adjustments across the board (i.e., the new seccomp and namespace adjustments), but the container runtime can be configured at startup to disable that default policy
  • validating plugins could be required in configuration (so that an administrator can be assured that the plugin's decision will be taken into account before any container has the opportunity to start)

I think this might address the concerns from @mikebrow (especially around default policy and "can we enforce that only the runtime default seccomp policy is eligible for modification) and from @haircommander (as a validating plugin could be Kubernetes namespace-aware or runtime class-aware).

Please poke holes in this suggestion.

samuelkarp avatar Feb 21 '25 00:02 samuelkarp

@klihub My apologies for the delay here.

Restrictions are communicated to an NRI plugin during registration. The plugin can then report and choose not to start up if the configured restrictions prevented it from functioning properly.

Is there a ton of value in communicating the restrictions to the plugins? I suppose a plugin can modify its behavior or (as you said) choose to not start up

The primary motivations were:

  • simple behavior: a plugin either can do everything it wants to, or none of it
  • (possible) early error detection: fail to start up early rather than fail to function later
  • simple handling of attempted violations: kick out the plugin on the basis of 'Hey... I told you so'

but then that seems that the behavior requires understanding more about how the plugin works.

And it might be difficult for the plugin to tell if it really could function. What it can only tell for sure is if it might fail. For instance, if for some corner case there is an extra necessary adjustment which is disallowed, but such a corner case would never be encountered with the set of containers run in the cluster, the plugin can't tell this a priori. An admin might, but then that would become a configuration option for the plugin anyway...

Some alternatives could be:

  • reject all adjustments made by a plugin to a container if any of the adjustments are disallowed
  • reject the container from starting if a plugin attempts to make disallowed adjustments

The former (rejecting adjustments) was the proposed behavior, plus kicking out the plugin (since it is now proven to try and disobey restrictions). But, now that I see you spell that out, I think we should fail to create/start the container as well (since now we also know for a fact that at least one plugin thinks that the container should not be started in pristine state, without adjustments).

Some advantage of those would be that the behavior is simple: either the plugin is non-functional or the entire container startup fails. But care would need to be taken in making sure the errors are exposed to the user properly.

Taking a step further back, I'm wondering if it's worth taking a cue from admission controllers. The two-phase approach of mutation and validation could be used here to allow an administrator to enforce a chosen policy, and (like Kubernetes) we could have a default policy that provides for some form of "recommended safe" behavior.

Trying to sketch this out a bit:

  • existing NRI plugins could be reclassified as mutating plugins and have no change to their behavior or interfaces
  • a new type of validating NRI plugin can be created
  • validating plugins could receive all the same inputs as mutating plugins and receive the list of adjustments (possibly annotated with the mutating plugin that proposed those adjustments)
  • the validating plugin can then allow or deny the container from running
  • we could build very simple policy directly into the NRI framework that default disallows certain classes of adjustments across the board (i.e., the new seccomp and namespace adjustments), but the container runtime can be configured at startup to disable that default policy
  • validating plugins could be required in configuration (so that an administrator can be assured that the plugin's decision will be taken into account before any container has the opportunity to start)

I think this might address the concerns from @mikebrow (especially around default policy and "can we enforce that only the runtime default seccomp policy is eligible for modification) and from @haircommander (as a validating plugin could be Kubernetes namespace-aware or runtime class-aware).

Please poke holes in this suggestion.

Okay, so let me try to paraphrase the essence of your proposal to see if I understand it correctly. Basically what you are saying is

  • Don't try to codify into (configuration) data what kind of NRI adjustments are allowed/forbidden and then accept/reject collected adjustments based on this.
  • Instead, extend NRI itself, allowing this to happen programmatically.
  • Define a new type of validating plugin (or validation request/event).
  • Pass collected adjustments (probably together with info about the adjusting plugins) to the validating plugins.
  • They will decide whether to adjust the container or reject its creation/startup.
  • We might also have coarse on/off restrictions policy/control built-in to NRI itself, but that gets disabled when validating plugins are in use.

If this is about right, I think it's an interesting proposal, and definitely worth serious consideration.

Some initial related thoughts:

  • Could definitely allow arbitrarily fine-grained restrictions control much better than trying to do it with configuration.
  • Since we'd use a plugin to decide if another plugin is allowed to adjust things, we'd need some mechanism to establish strong trust in the former. Maybe something like @kad had in mind for plugin authorization.
  • Related to the possibility of 'requiring validating plugins in the configuration', some potential chicken-and-egg problems if one would like to deploy the validating plugin itself as a K8s DaemonSet.

klihub avatar Feb 21 '25 09:02 klihub

But, now that I see you spell that out, I think we should fail to create/start the container as well (since now we also know for a fact that at least one plugin thinks that the container should not be started in pristine state, without adjustments).

Yeah, I think so too. This is a form of fail-closed, since we know that the un-adjusted container "should not" run.

Okay, so let me try to paraphrase the essence of your proposal to see if I understand it correctly. Basically what you are saying is

Your paraphrase is about right. The set of constraints/requirements I'm thinking about in this problem space includes:

  • having safe defaults (i.e., some sort of simple built-in policy)
  • but not blocking more complex use-cases and allowing administrators to define their own policies
  • avoiding creating a new policy language inside NRI (which would imply that we need to pre-define all sets of possible policies, or else limit the expressiveness of policies)
  • allowing the policy to have the right level of complexity for the given situation (i.e., there can be simple policies, but capturing the exceptions might be challenging if those exceptional cases are not uniform)

With the model I've proposed, a validating plugin could:

  • implement the requirements @haircommander discussed by requiring that some set of containers has a given setting or adjustment, but not applying that requirement to all containers
  • satisfy https://github.com/containerd/nri/issues/142 by determining whether a container should proceed if a plugin did not apply its adjustments cleanly (similar to @klihub's idea of a "verification plugin" in https://github.com/containerd/nri/issues/142#issuecomment-2649011688) (cc @aojea)
  • implement @mikebrow's idea where "only the runtime default seccomp policy should allow adjustments"

And I hope the simple built-in policy (like the default restriction of any seccomp adjustment) can satisfy @mikebrow's concern about overstepping security considerations that were expressed by the Kubelet.

Some initial related thoughts:

  • Could definitely allow arbitrarily fine-grained restrictions control much better than trying to do it with configuration.

Yes, I hope so.

  • Since we'd use a plugin to decide if another plugin is allowed to adjust things, we'd need some mechanism to establish strong trust in the former. Maybe something like @kad had in mind for plugin authorization.
  • Related to the possibility of 'requiring validating plugins in the configuration', some potential chicken-and-egg problems if one would like to deploy the validating plugin itself as a K8s DaemonSet.

These two might be two sides of the same coin; one possibility could be an initial validating plugin that is only defined in the configuration (and thus can't be deployed as a K8s daemonset) that then waits for the more-complete validating plugin to be installed before allowing any other workloads to proceed. But I think this might be one of the most questionable parts of my proposal that still needs more fine-tuning.

samuelkarp avatar Feb 21 '25 22:02 samuelkarp

@samuelkarp I like the idea of validating NRI plugin, but not sure what would be the benefit of also having some config per type, either you trust the mutating plugins and no point in limiting, or you want to give NRI access to untrusted plugins and you really need in depth validation.

note: for the implementation, we would need to fail the validation if new fields are present (containerd more recent than the validating plugin)

champtar avatar Feb 22 '25 06:02 champtar

I really like @samuelkarp proposal, and the model is well known in kubernetes , I love the idea of having the same mental model

one possibility could be an initial validating plugin that is only defined in the configuration (and thus can't be deployed as a K8s daemonset) that then waits for the more-complete validating plugin to be installed before allowing any other workloads to proceed. But I think this might be one of the most questionable parts of my proposal that still needs more fine-tuning.

Yeah, this is also one of the main problem of validating webhooks, since you need to deploy "something" that need to validate other "something", and your validation webhook may be restarted and fail legit operations or not start when you want, I've seen multiple issues because of this model of deploying validation as a separate component ... the answer of kubernetes to this problem is not to do validation as an external plugin, but embed the capability of doing validation in the control plane itself via configuration using a flexible language like CEL https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/ ...

aojea avatar Feb 22 '25 09:02 aojea

On the validating plugin: nod, we've talked in the past about introducing plugin dependencies/ordering.. validating plugin(s) are also a viable option to allow resource managers access to the bottom of the NRI plugin stack. Though if seen as a requirement for security context mutations it may make these mutations overly complex. Considerations: How many mutation plugins/validators would be needed to handle separation of concern issues; also the ole problem of every hook/plugin wants to be last in the chain to consider.

mikebrow avatar Feb 25 '25 14:02 mikebrow

On the validating plugin: nod, we've talked in the past about introducing plugin dependencies/ordering.. validating plugin(s) are also a viable option to allow resource managers access to the bottom of the NRI plugin stack. Though if seen as a requirement for security context mutations it may make these mutations overly complex. Considerations: How many mutation plugins/validators would be needed to handle separation of concern issues; also the ole problem of every hook/plugin wants to be last in the chain to consider.

for validation plugin it does not matter the order, right? the result is the AND of all plugins, or all are OK with the change or a single DENY will be enought

aojea avatar Mar 06 '25 21:03 aojea

for validation plugin it does not matter the order, right? the result is the AND of all plugins, or all are OK with the change or a single DENY will be enought

Correct. Validation plugins would not be able to perform mutation, and would need to run after all mutating plugins are complete. We could even dispatch to validating plugins in parallel and AND the results together.

samuelkarp avatar Mar 06 '25 21:03 samuelkarp

Where order is not important we are probably already there on the container side with the post create/start container calls with error response. For run pod we probably need either a post run and/or consider splitting run into create start operations and add post create post start for pod as well. Or.. door number 3 only allow validation on the pod's pause container post create... may have issues later in door 3 if we ever remove the pause container and replace it with a task.

mikebrow avatar Mar 07 '25 16:03 mikebrow

Where order is not important we are probably already there on the container side with the post create/start container calls with error response. For run pod we probably need either a post run and/or consider splitting run into create start operations and add post create post start for pod as well. Or.. door number 3 only allow validation on the pod's pause container post create... may have issues later in door 3 if we ever remove the pause container and replace it with a task.

I think that we only need validation of those requests for which a plugin can respond with changes to some otherwise immutable parameter of the pod or container. Currently the only such request is CreateContainer.

klihub avatar Mar 07 '25 17:03 klihub

Where order is not important we are probably already there on the container side with the post create/start container calls with error response. For run pod we probably need either a post run and/or consider splitting run into create start operations and add post create post start for pod as well. Or.. door number 3 only allow validation on the pod's pause container post create... may have issues later in door 3 if we ever remove the pause container and replace it with a task.

I think that we only need validation of those requests for which a plugin can respond with changes to some otherwise immutable parameter of the pod or container. Currently the only such request is CreateContainer.

nod..

For pod validation, one concern was when crio is running in task only / non pause container mode .. would post create container be sufficient then for the validation cases where there is no actual pause container.. Or do we say you can't mutate the pause container?

For container mutation validation I have some low level thoughts spinning around in my head, due primarily to the oci hook calls, ... Is there no case where one might want to do plugin validation post start vs post create.

mikebrow avatar Mar 07 '25 17:03 mikebrow

One of the desires and use cases expressed in https://github.com/containerd/nri/issues/142#issuecomment-2661405662 is to "block pod creation until something is ready", RunPodSandbox is a hook that people that operates at the Pod level (networking specially between others) wants to have the capability to block on RunPodSandbox. This is also important because in Kubernetes is the first of the rpcs to create a Pod,

Image

aojea avatar Mar 08 '25 08:03 aojea

One of the desires and use cases expressed in #142 (comment) is to "block pod creation until something is ready", RunPodSandbox is a hook that people that operates at the Pod level (networking specially between others) wants to have the capability to block on RunPodSandbox. This is also important because in Kubernetes is the first of the rpcs to create a Pod

@aojea I think the most appealing way forward for this would be using (and allowing NRI to affect) existing mechanisms, most prominently CRI runtime conditions (RuntimeReady, NetworkReady, potentially some new one, for instance indicating whether mandatory plugins configured by the cluster admin are present). And maybe additionally allowing the runtime to propagate/remove taints to/from the node (but this only makes sense if it can be done via the kubelet through CRI).

klihub avatar Apr 14 '25 09:04 klihub