kcp bug: failure to provide identity hash in APIExport / APIBinding permission claims fails silently

Describe the bug

I had an APIExport that included permissionClaims for deployments/apps. Deployments are not a built-in type, and are only available in a given workspace after setting up a SyncTarget to a pcluster (which creates an APIExport with name kubernetes).

$ k get apiexports kubernetes -o yaml
apiVersion: apis.kcp.dev/v1alpha1
kind: APIExport
metadata:
  annotations:
    kcp.dev/cluster: root
    workload.kcp.dev/skip-default-object-creation: "true"
  creationTimestamp: "2022-10-07T12:17:03Z"
  generation: 3
  name: kubernetes
  resourceVersion: "506"
  uid: e3af3dac-d89a-48ff-8921-5fd0826a7cd3
spec:
  identity:
    secretRef:
      name: kubernetes
      namespace: kcp-system
  latestResourceSchemas:
  - rev-503.deployments.apps
status:
  conditions:
  - lastTransitionTime: "2022-10-07T12:17:03Z"
    status: "True"
    type: IdentityValid
  - lastTransitionTime: "2022-10-07T12:17:03Z"
    status: "True"
    type: VirtualWorkspaceURLsReady
  identityHash: 5fdf7c7aaf407fd1594566869803f565bb84d22156cef5c445d2ee13ac2cfca6
  virtualWorkspaces:
  - url: https://192.168.1.189:6443/services/apiexport/root/kubernetes

The permissionClaims in my APIExport looked like the following:

  permissionClaims:
  - group: ""
    resource: secrets
  - group: ""
    resource: namespaces
  - group: ""
    resource: configmaps
  - group: apps
    # (SHOULD HAVE THIS) identityHash: 5fdf7c7aaf407fd1594566869803f565bb84d22156cef5c445d2ee13ac2cfca6
    resource: deployments

Because I did not have the identity hash provided, when I created an APIBinding for the export, Deployments did not show up in the virtual workspace for the exported types.

$ k -s https://192.168.1.189:6443/services/apiexport/root/kxp.crossplane.io/clusters/* api-resources
NAME                           SHORTNAMES   APIVERSION                             NAMESPACED   KIND
configmaps                                  v1                                     true         ConfigMap
namespaces                                  v1                                     false        Namespace
secrets                                     v1                                     true         Secret

However, the failure for the permission claim to be applied was not reflected in the export (see in status above) or binding:

   conditions:
  - lastTransitionTime: "2022-10-07T12:54:01Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2022-10-07T12:54:01Z"
    status: "True"
    type: APIExportValid
  - lastTransitionTime: "2022-10-07T12:54:01Z"
    status: "True"
    type: BindingUpToDate
  - lastTransitionTime: "2022-10-07T12:54:01Z"
    status: "True"
    type: InitialBindingCompleted
  - lastTransitionTime: "2022-10-07T12:54:01Z"
    status: "True"
    type: PermissionClaimsApplied
  - lastTransitionTime: "2022-10-07T12:54:01Z"
    status: "True"
    type: PermissionClaimsValid
  exportPermissionClaims:
  - resource: secrets
  - resource: namespaces
  - resource: configmaps
  - group: apps
    resource: deployments

Steps To Reproduce

(these are example steps, the same could be accomplished via a permissionClaim that doesn't reference an identity hash on any type provided by an APIExport)

Setup SyncTarget with kubernetes APIExport.
Create APIExport referencing deployments/apps
Create APIBinding to a workspace that references the APIExport and accepts the permissionClaims
Observe the type not being available in the virtual workspace for the api export

Expected Behaviour

I would expect that the APIExport would either be rejected on creation (via admission) or for failure to be surfaced in conditions. There is a bit of nuance here around different types of errors for export / binding. For example:

something like admission could be used to say "this isn't a built-in type, so you need some identity hash"
something like conditions could be used to say "your identity hash is incorrect" (this may already be the case, need to double check)

Additional Context

No response

Oct 07 '22 13:10 hasheddan

/assign

Oct 07 '22 13:10 hasheddan

As I was debugging this for my own situations export was partially working and native types like secrets and configMaps responded, but SyncTargets were giving me 404 errors. Which led me to quit a rabit hole while debugging and false-positive controller behaviour where it was working, but some flow was not.

I suspect this partial working state might create quite an interesting set of bugs/outages in productions.

I would be almost tempted to go for full deny of APIExport or not serve them at all in the virtualWorkspace if one of the claims are not working fine. This way we could avoid situations where it failed (even with Codition set to failed) but user would not notice and system can be tricky to debug.

So maybe for now status for visibility and a discussion if current behaviour to serve partially would be something we would want to change?

In addition I think we need to extend APIExports and APIBindings with CLI collumns to report status as view:

[mjudeikis@unknown faros-hub]$ kubectl get apibinding
NAME                        AGE
access.faros.sh             3h52m
apiresource.kcp.dev-1wked   4h3m
kubernetes                  3h56m
scheduling.kcp.dev-1gowj    4h3m
tenancy.kcp.dev-59phz       4h3m
workload.kcp.dev-4gqw7      4h3m

is not very helpful if something is failing. I think separate issue in order if one does exit yet.

Oct 09 '22 15:10 mjudeikis

@mjudeikis thanks for the write-up!

I would be almost tempted to go for full deny of APIExport or not serve them at all in the virtualWorkspace if one of the claims are not working fine.

I'm currently thinking of the following strategy:

identity hash is not provided for non-internal type: deny at admission
if partial export happens for some other reason: surface in conditions

I could also see not surfacing the virtualWorkspace URL if export is not functioning properly, but we would need to account for the case where an issue is intermittent. This is somewhat related to the discussion on https://github.com/kcp-dev/kcp/issues/1183.

Oct 10 '22 00:10 hasheddan

I could also see not surfacing the virtualWorkspace URL if export is not functioning properly

I would be almost tempted to go for full deny of APIExport or not serve them at all in the virtualWorkspace if one of the claims are not working fine. This way we could avoid situations where it failed (even with Codition set to failed) but user would not notice and system can be tricky to debug.

If I'm understanding correctly, you're both talking about some logic that would look kind of like what's in #2046, but in the controller and managing a condition or the virtualWorkspaceURL. Is that that right?

Oct 12 '22 19:10 nrb

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Apr 12 '24 20:04 kcp-ci-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

May 12 '24 20:05 kcp-ci-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Jun 11 '24 20:06 kcp-ci-bot

@kcp-ci-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jun 11 '24 20:06 kcp-ci-bot

kcp kcp copied to clipboard

bug: failure to provide identity hash in APIExport / APIBinding permission claims fails silently

Describe the bug

Steps To Reproduce

Expected Behaviour

Additional Context

kcp
kcp copied to clipboard