kcp icon indicating copy to clipboard operation
kcp copied to clipboard

Add e2e to ensure the Kubernetes namespace controller is functional

Open ncdc opened this issue 3 years ago • 9 comments

The Kubernetes namespace controller is responsible for deleting all resources from a namespace when the namespace is deleted. Because kcp has resources in multiple logical clusters, we need to make sure the namespace controller is able to delete resources from namespaces from multiple logical clusters. It's possible we might accidentally break things in the future as we refactor & improve kcp's code, so let's make sure we have an end to end test that will catch if we break the namespace controller.

ncdc avatar Feb 01 '22 14:02 ncdc

Does this imply running a subset of upstream kube's compliance e2e or would this testing be net-new?

marun avatar Feb 01 '22 14:02 marun

Long-term, it probably implies carving out a subset of "generic control plane" conformance tests from upstream. But if upstream never has multiple logical clusters, I think we'll still need to create kcp variants to ensure things work in a multi-logical cluster setup. WDYT?

ncdc avatar Feb 01 '22 14:02 ncdc

These cross-workspace controllers with dynamic RESTMapper are deeply broken right now. We cannot poll discovery for that. We have to come up with something else, e.g. using discovery for the native resources, and an CRD informer for the rest.

sttts avatar Feb 01 '22 14:02 sttts

@sttts could you please elaborate on the discovery issues?

ncdc avatar Feb 01 '22 14:02 ncdc

We have 60s poll interval updating a RESTMapper in a couple of places in kube-apiserver (for admission I guess) and in kube-controller-manager for GC, namespace and quota. That RESTMapper is used to iterate over all known types and its critical for consistency.

Types change by workspace (different CRDs in different workspaces). So we need a different resource list for each. Doing that with polling is not a good idea. Something that we can watch would be much better.

sttts avatar Feb 01 '22 15:02 sttts

Clearing milestone to re-triage

ncdc avatar Apr 13 '22 18:04 ncdc

cc @qiujian16

sttts avatar May 05 '22 06:05 sttts

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kcp-ci-bot avatar Apr 12 '24 20:04 kcp-ci-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kcp-ci-bot avatar May 12 '24 20:05 kcp-ci-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

kcp-ci-bot avatar Jun 11 '24 20:06 kcp-ci-bot

@kcp-ci-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kcp-ci-bot avatar Jun 11 '24 20:06 kcp-ci-bot