capsule icon indicating copy to clipboard operation
capsule copied to clipboard

object count resource quotas not working and breaking other Tenant functionality

Open adabuleanu opened this issue 1 year ago • 3 comments

Bug description

The Tenant CR let's you define the kubernetes resource quotas. One such resource quota is object count, for example jobs count is defined by count/jobs.batch. While defining such configuration, the Tenant object is created, but this silently fails. Moreover, so of the other functionality is not working as expected. This was initially fixed in https://github.com/projectcapsule/capsule/issues/507 in v1beta1 api, but somehow it was not moved to v1beta2

How to reproduce

  1. Create a Tenant resource:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  name: dummytenant
  owners:
  - clusterRoles:
    - admin
    - capsule-namespace-deleter
    kind: ServiceAccount
    name: system:serviceaccount:ns:dummy
  resourceQuotas:
    items:
    - hard:
        count/jobs.batch: "2"
        limits.cpu: 300m
        limits.ephemeral-storage: 1Gi
        limits.memory: 1200Mi
    scope: Tenant
  1. Check the underlying ResourceQuota created, it should not contain any jobs quota.
  2. Moreover, try to create a Namespace as the owner:
kubectl --as=system:serviceaccount:ns:dummy create ns dummytenant-ns1
  1. This will create the namespace successfully, but when you try to access it with the same owner ServiceAccount, you will not be allowed:
$ kubectl -as=system:serviceaccount:ns:dummy get ns dummytenant-ns1
Error from server (Forbidden): namespaces "dummytenant-ns1" is forbidden: User "system:serviceaccount:ns:dummy " cannot get resource "namespaces" in API group "" in the namespace "dummytenant-ns1"

This is caused by the fact that capsule does not created the RoleBindings associated with the new namespace. If you check the RoleBindings you will not see any created for the respective Namespace

$ kubectl get rolebindings -A | grep dummytenant
  1. If you repeat all steps above without the count/jobs.batch configuration, the RoleBindings will be created as expected.

Expected behavior

  • Tenant should be able to define count/jobs.batch ResourceQuotas.
  • Errors related to a configuration should not perpetuate to other functionality (see Namespace example above).

Logs

Error logs

ResourceQuota \"capsule-dummytenant-0\" is invalid: [metadata.annotations: Invalid value: \"quota.capsule.clastix.io/hard-count/jobs.batch\": a qualified name must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]') with an optional DNS subdomain prefix and '/' (e.g. 'example.com/MyName'), metadata.annotations: Invalid value: \"quota.capsule.clastix.io/used-count/jobs.batch\": a qualified name must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]') with an optional DNS subdomain prefix and '/' (e.g. 'example.com/MyName')]

Additional context

  • Capsule version: v0.3.3
  • Helm Chart version: 0.4.3
  • Kubernetes version: v1.27.7

adabuleanu avatar Jan 12 '24 10:01 adabuleanu

Thanks for opening this, @adabuleanu.

I'll try to rise a PR to solve this, it would be great if you could give it a try!

prometherion avatar Jan 23 '24 19:01 prometherion

@adabuleanu I'm testing this by running Capsule v0.5.0 (ghcr.io/projectcapsule/capsule:v0.5.0) but I'm not able to replicate the issue.

$: kubectl get tnt dummytenant -o yaml
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"capsule.clastix.io/v1beta2","kind":"Tenant","metadata":{"annotations":{},"name":"dummytenant"},"spec":{"owners":[{"clusterRoles":["admin","capsule-namespace-deleter"],"kind":"ServiceAccount","name":"system:serviceaccount:ns:dummy"}],"resourceQuotas":{"items":[{"hard":{"count/jobs.batch":"2","limits.cpu":"300m","limits.ephemeral-storage":"1Gi","limits.memory":"1200Mi"}}],"scope":"Tenant"}}}
  creationTimestamp: "2024-01-23T19:43:42Z"
  generation: 2
  labels:
    kubernetes.io/metadata.name: dummytenant
  name: dummytenant
  resourceVersion: "3568938"
  uid: 5aa9eb2a-c78b-4805-9c41-1c7af6865afb
spec:
  ingressOptions:
    hostnameCollisionScope: Disabled
  limitRanges: {}
  networkPolicies: {}
  owners:
  - clusterRoles:
    - admin
    - capsule-namespace-deleter
    kind: ServiceAccount
    name: system:serviceaccount:ns:dummy
  resourceQuotas:
    items:
    - hard:
        count/jobs.batch: "2"
        limits.cpu: 300m
        limits.ephemeral-storage: 1Gi
        limits.memory: 1200Mi
    scope: Tenant
status:
  namespaces:
  - dummytenant-test
  size: 1
  state: Active

$: kubectl -n dummytenant-test get resourcequota
NAME                    AGE    REQUEST                 LIMIT
capsule-dummytenant-0   101s   count/jobs.batch: 0/2   limits.cpu: 0/300m, limits.ephemeral-storage: 0/1Gi, limits.memory: 0/1200Mi

I don't think we have a specific issue between the Tenant APIs, since we have a webhook conversion and those are annotations which are always the same between different versions.

I suspect you're using a old version of Capsule, you can grab more details in the first logs upon startup of the Capsule pod.

{"level":"info","ts":"2024-01-23T19:48:12.764Z","logger":"setup","msg":"Capsule Version v0.5.0 74d3ac5dirty"}
{"level":"info","ts":"2024-01-23T19:48:12.764Z","logger":"setup","msg":"Build from: https://github.com/projectcapsule/capsule"}
{"level":"info","ts":"2024-01-23T19:48:12.764Z","logger":"setup","msg":"Build date: "}
{"level":"info","ts":"2024-01-23T19:48:12.764Z","logger":"setup","msg":"Go Version: go1.20.11"}
{"level":"info","ts":"2024-01-23T19:48:12.764Z","logger":"setup","msg":"Go OS/Arch: linux/amd64"}

prometherion avatar Jan 23 '24 19:01 prometherion

I was able to reproduce this issue with 0.3.3 in a customer environment. @adabuleanu were you able to upgrade to a newer release?

oliverbaehler avatar Feb 13 '24 12:02 oliverbaehler