gateway-api Route inclusion/delegation security and policy implications

Route inclusion/delegation security and policy implications

Open costinm opened this issue 2 years ago • 6 comments

https://docs.google.com/document/d/1iumYQQMaB9y-PCtMsRwVfzKlZXZDejr_fO0krXn3F90/edit defines a mechanism to delegate and include routes.

There are several security and policy concerns:

what happens with attached policies, like security ? How are they merged or applied ?
for the common / recommended case - where users write label selectors using the namespace - this is equivalent with a list of namespace (but in a very verbose syntax - UX is not ideal ). If they deviate and use arbitrary labels - it becomes a security risk, and it also requires the owner of the HttpRoutes that are attaching to have namespace label edit permission, which in turn creates additional risks.
the merging semantics - where all fields/filters/attached policies may be impacted - may also introduce subtle routing or policy problems.

Mar 14 '22 23:03 costinm

Also I would like some discussion if this is really a core feature that all implementations must support, or can be expressed as an optional feature - and use a separate resource and the normal policy attachment.

For example, it could be a 'example.com Delegate' CR, that is attached to a HttpRoute and defines any label selector you want, with implementations supporting this extended policy implementing what is described in the proposal, i.e. merge routes based on the labels to the main route.

Mar 14 '22 23:03 costinm

Thanks for raising these points, @costinm, this is a great discussion.

Okay, easy one first.

Also I would like some discussion if this is really a core feature that all implementations must support, or can be expressed as an optional feature...

I think we can easily make this an optional feature by making the allowedPorts field on Routes have Extended support. That means that it's not necessary to support the feature, but if you do, we will mandate how the behavior works. (That's distinct from ImplementationSpecific fields, where we are saying that implementations can do whatever we want.)

I am a -1 on either having a separate CR, using another struct that allowedRoutes, or other solutions, because I think that having the Route<->Route attachment process be the same a the Gateway<->Route attachment process is such a big usability win. In fact, I would go so far as to say that Routes attaching to Gateways is functionally including the Route config inside the Gateway object, or (if the Route owner and Gateway owner are separate users) delegating control over that listener to the attached Route objects. If you think of it this way, reusing the same attachment method is very clear.

At the end of the day, all of these constructs are a way to break up a data plane configuration to make it easy for multiple personas to control different parts of it. Routes are not routing hops, they're a mechanism to ensure we meet our goals of separating roles and being able to migrate between implementations.

In terms of merging semantics for the Routes themselves, we already have merging semantics defined for the Gateway<->Route process, we can reuse the same rules to manage conflict resolution for the basic Route<->Route attachments as well.

In terms of the more complex interactions, like policy attachment, I agree that it's very possible to create conflicts in many ways. I think that we need to be clear that implementations can and should disallow attachments that produce outcomes like this. Since Policy attachment objects are currently by definition implementation specific, the handling of the behavior there must also be implementation specific. I'll grant that we could provide some better guidance in the Policy attachment documentation about what to do with sibling policies that conflict - but the policy attachment documentation is the place to do that, not here.

You didn't mention it in the above comments, but we discussed cross-namespace references in the meeting, and I think it's worth reiterating why we have this two-way handshake mechanism - explicitly it's to allow cross namespace references in as secure a way as possible. The referring object (in this case the child Route attaching to a parent, whether Gateway or Route) must specify the exact name of the thing it wants to attach to, while the parent can (and must in the case of Route<->Route attachment) specify what it will allow. It's requiring the referred object to also consent to the attachment that makes the cross-namespace references okay. (And that's why we built the ReferencePolicy object for cross-namespace references to things that we can't change the spec of, like Secret or Service.)

In terms of having an explicit list of named namespaces instead of a label selector, my concerns (and the reasons that these are not recommended in other API resources) are:

Using an explicit list can easily run up against object size limits if there are a lot of namespaces (and if we don't allow an arbitrary number, we're not serving people with very large clusters)
More importantly, using an explicit list means that adding a new namespace that a parent can accept routes from needs two operations: a cluster admin must label the namespace, and a Gateway admin must edit the object. For a busy cluster with namespace churn, being able to use namespace selectors allows this to be handled dynamically. (Does this suck for implementors? Absolutely. But the users that need that level of dynamic functionality will really need it).

That was a long commment, sorry.

tl;dr I think that using the same allowedRoutes construct for Gateway<->Route and Route<->Route attachment solves significantly more problems than it creates.

Mar 16 '22 04:03 youngnick

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 14 '22 05:06 k8s-triage-robot

This is held up along with the actual inclusion work until after v0.5.0, keeping unstale.

/remove-lifecycle stale

Jun 16 '22 01:06 youngnick

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 14 '22 01:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Oct 14 '22 01:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Nov 13 '22 02:11 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Nov 13 '22 02:11 k8s-ci-robot

gateway-api gateway-api copied to clipboard

Route inclusion/delegation security and policy implications

gateway-api
gateway-api copied to clipboard