kcp icon indicating copy to clipboard operation
kcp copied to clipboard

Redirect issues with Workspaces and standalone virtual-workspace server

Open ncdc opened this issue 3 years ago • 15 comments

When the partial metadata informer is trying to request /clusters/*/apis/tenancy.kcp.dev/v1beta1/workspaces from the kcp process, this gets redirected to the front proxy, and we see this error

W0819 11:41:35.676289       1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *unstructured.Unstructured: Get "https://<external url>/services/workspaces/%2A/apis/tenancy.kcp.dev/v1beta1/workspaces?limit=500&resourceVersion=0": x509: certificate is valid for [REDACTED], not apiserver-loopback-client

We need to figure out how to handle situations where the loopback client might get redirected to another URL where it can't validate the certificate.

Originally posted by @ncdc in https://github.com/kcp-dev/kcp/issues/1654#issuecomment-1220579059

ncdc avatar Aug 19 '22 19:08 ncdc

FYI @sttts @p0lyn0mial @stevekuznetsov @csams from our discussion today

ncdc avatar Aug 19 '22 19:08 ncdc

unfortunately, this issue hits also any controller that needs to deal with an external vw server, like the apibinding_deletion_controller (controllers use c.identityConfig = rest.CopyConfig(c.GenericConfig.LoopbackClientConfig)

I0824 13:15:01.340989   13424 apibinding_deletion_controller.go:328] "patching APIBinding" reconciler="kcp-apibindingdeletion" key="root:e2e-org-hchc2|tenancy.kcp.dev" apibinding.workspace="root:e2e-org-hchc2" apibinding.namespace="" apibinding.name="tenancy.kcp.dev" apibinding.apiVersion="" patch="{\"metadata\":{\"resourceVersion\":\"1573\",\"uid\":\"b6ee73d3-806c-4a65-818c-90db80ba406a\"},\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"APIExportValid\"},{\"lastTransitionTime\":\"2022-08-24T11:15:01Z\",\"message\":\"Get \\\"https://127.0.0.1:6444/services/workspaces/root:e2e-org-hchc2/apis/tenancy.kcp.dev/v1beta1/workspaces\\\": x509: certificate is valid for localhost, not apiserver-loopback-client\",\"reason\":\"ResourceDeletionFailed\",\"severity\":\"Error\",\"status\":\"False\",\"type\":\"BindingResourceDeleteSuccess\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"BindingUpToDate\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"InitialBindingCompleted\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"PermissionClaimsApplied\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"PermissionClaimsValid\"}]}}"
E0824 13:15:01.360752   13424 apibinding_deletion_controller.go:178] deletion of apibinding root:e2e-org-hchc2|tenancy.kcp.dev failed: Get "https://127.0.0.1:6444/services/workspaces/root:e2e-org-hchc2/apis/tenancy.kcp.dev/v1beta1/workspaces": x509: certificate is valid for localhost, not apiserver-loopback-client
I0824 13:15:01.579106   13424 apibinding_deletion_controller.go:328] "patching APIBinding" reconciler="kcp-apibindingdeletion" key="root:e2e-org-hchc2:e2e-workspace-25cmk|tenancy.kcp.dev" apibinding.workspace="root:e2e-org-hchc2:e2e-workspace-25cmk" apibinding.namespace="" apibinding.name="tenancy.kcp.dev" apibinding.apiVersion="" patch="{\"metadata\":{\"resourceVersion\":\"1588\",\"uid\":\"21aec9fe-5d52-47c9-9e5e-134e37b8f13e\"},\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"APIExportValid\"},{\"lastTransitionTime\":\"2022-08-24T11:15:01Z\",\"message\":\"Get \\\"https://127.0.0.1:6444/services/workspaces/root:e2e-org-hchc2:e2e-workspace-25cmk/apis/tenancy.kcp.dev/v1beta1/workspaces\\\": x509: certificate is valid for localhost, not apiserver-loopback-client\",\"reason\":\"ResourceDeletionFailed\",\"severity\":\"Error\",\"status\":\"False\",\"type\":\"BindingResourceDeleteSuccess\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"BindingUpToDate\"},{\"lastTransitionTime\":\"2022-08-24T11:14:54Z\",\"status\":\"True\",\"type\":\"InitialBindingCompleted\"},{\"lastTransitionTime\":\"2022-08-24T11:14:55Z\",\"status\":\"True\",\"type\":\"PermissionClaimsApplied\"},{\"lastTransitionTime\":\"2022-08-24T11:14:55Z\",\"status\":\"True\",\"type\":\"PermissionClaimsValid\"}]}}"
E0824 13:15:01.581461   13424 apibinding_deletion_controller.go:178] deletion of apibinding root:e2e-org-hchc2|tenancy.kcp.dev failed: Get "https://127.0.0.1:6444/services/workspaces/root:e2e-org-hchc2/apis/tenancy.kcp.dev/v1beta1/workspaces": x509: certificate is valid for localhost, not apiserver-loopback-client
E0824 13:15:01.666039   13424 apibinding_deletion_controller.go:178] deletion of apibinding root:e2e-org-hchc2:e2e-workspace-25cmk|tenancy.kcp.dev failed: Get "https://127.0.0.1:6444/services/workspaces/root:e2e-org-hchc2:e2e-workspace-25cmk/apis/tenancy.kcp.dev/v1beta1/workspaces": x509: certificate is valid for localhost, not apiserver-loopback-client

it also affects users accessing workspaces, i.e.

 k get --raw '/clusters/root/apis/tenancy.kcp.dev/v1beta1/workspaces'
Unable to connect to the server: x509: certificate is valid for 192.168.32.104, not 127.0.0.1

p0lyn0mial avatar Aug 24 '22 11:08 p0lyn0mial

Maybe we need a proxy rather than a simple redirection? The proxy could verify the vw server cert.

p0lyn0mial avatar Aug 24 '22 11:08 p0lyn0mial

it also affects users accessing workspaces, i.e.

k get --raw '/clusters/root/apis/tenancy.kcp.dev/v1beta1/workspaces' Unable to connect to the server: x509: certificate is valid for 192.168.32.104, not 127.0.0.1

This is dependent upon the deployment topology. We have a topology where this URL redirects to the front proxy (which then maps it into the virtual workspaces container). In this setup, a client is able to validate the front proxy's certificate correctly and everything works.

ncdc avatar Aug 24 '22 12:08 ncdc

This is dependent upon the deployment topology. We have a topology where this URL redirects to the front proxy (which then maps it into the virtual workspaces container)

Do we really do that? That's wrong. We must go directly to the vw address.

sttts avatar Aug 30 '22 07:08 sttts

When the partial metadata informer is trying to request /clusters/*/apis/tenancy.kcp.dev/v1beta1/workspaces from the kcp process, this gets redirected to the front proxy, and we see this error

One step back: do we actually want that the ddsif lists projections?

sttts avatar Aug 30 '22 08:08 sttts

To sum up a discussion with Stefan in slack. The ddsif and the apibinding_deletion_controller shouldn't use projection resources. Instead of workspaces they should use ClusterWorkspaces.

For local development, we should use some domain name so that we can validate the server. It should be possible since we have self-signed certs and CAs.

p0lyn0mial avatar Aug 30 '22 08:08 p0lyn0mial

https://github.com/kcp-dev/kcp/pull/1805 stops the ddsif from using v1beta1 Workspaces. But this is hard-coded for the time being.

ncdc avatar Aug 30 '22 17:08 ncdc

But this is hard-coded for the time being.

This is fine for now until we come up with a generic projection concept.

sttts avatar Aug 30 '22 18:08 sttts

What we could do now is to introduce a pkg/projection package and have a list there with projected GRs, and a map where to map to.

sttts avatar Aug 30 '22 18:08 sttts

On it (need it to fix another issue)

ncdc avatar Aug 30 '22 19:08 ncdc

#1860

ncdc avatar Aug 30 '22 20:08 ncdc

Cleared milestone and put in backlog

ncdc avatar Oct 05 '22 14:10 ncdc

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kcp-ci-bot avatar Apr 12 '24 20:04 kcp-ci-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kcp-ci-bot avatar May 12 '24 20:05 kcp-ci-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

kcp-ci-bot avatar Jun 11 '24 20:06 kcp-ci-bot

@kcp-ci-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kcp-ci-bot avatar Jun 11 '24 20:06 kcp-ci-bot