trust-manager icon indicating copy to clipboard operation
trust-manager copied to clipboard

Provide deterministic bundle

Open Jiawei0227 opened this issue 1 year ago • 6 comments

The generated trust bundle from trust-manager should provide deterministic bundle in the sense that if the source of the CA order changes, or there are multiple CA sources from multiple secret and they are shuffled in the spec. The generated final trust bundle should always be the same with the same ordering. Mayby by alphabetic order or expiration time order or something.

Jiawei0227 avatar Feb 28 '24 20:02 Jiawei0227

Cross link to: https://github.com/cert-manager/trust-manager/pull/303#issuecomment-1968587812

This means if a Bundle A looks like:

apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: trust-store
spec:
  sources:
  - secret:
      key: ca.crt
      name ca1
  - secret:
      key: ca.crt
      name: ca2
  target:
    configMap:
      key: ca.crt

and bundle B looks like this

apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: trust-store
spec:
  sources:
  - secret:
      key: ca.crt
      name ca2
  - secret:
      key: ca.crt
      name: ca1
  target:
    configMap:
      key: ca.crt

it should always produce the same content. Also if we move one CA from ca1 -> ca2, the result should be the same.

Jiawei0227 avatar Feb 28 '24 20:02 Jiawei0227

@SgtCoDFish thoughts on potentially get this done? It sounds to me we can just do alphabetic order during producing final bundle which should be good enough.

Jiawei0227 avatar Mar 01 '24 01:03 Jiawei0227

I don't currently have the bandwidth to implement this, but I'd be happy to review a PR which does it! My 2c would be to hash the DER-encoded certs and then order them alphanumerically based on the hex-encoded hash

SgtCoDFish avatar Mar 01 '24 10:03 SgtCoDFish

not sure if anyone would be interested to pick it up but this will be a critical feature. Reason is our component is mounting the trust bundle configmap and the other automation is reconciling the bundle. But if the bundle data keep reordering it will be very expensive and unnecessary.

Jiawei0227 avatar Mar 01 '24 22:03 Jiawei0227

But the order is consistent now, and that's good/required. Are you planning to shuffle the sources around @Jiawei0227? I don't say this shouldn't be fixed, but I don't consider it critical. 😸

erikgb avatar Mar 01 '24 22:03 erikgb

Hi, we have a Bundle which includes six configMaps using a label selector:

apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: cluster-default-ca
spec:
  sources:
  - useDefaultCAs: true
  - configMap:
      selector:
        matchLabels: 
          cluster-default-ca: "true"
      key: "root-ca.pem"
  target:
    configMap:
      key: "root-certs.pem"
    additionalFormats:
      jks:
        key: "root-certs.jks"
      pkcs12:
        key: "root-certs.p12"

We also observe that the order of certificates changes with every reconciliation, leading to many unnecessary updates of the generated configmaps. While the output for root-certs.jks seems to be consistent, root-certs.pem and root-certs.p12 change with almost every reconciliaton.

sebEg avatar Jul 09 '24 09:07 sebEg

@erikgb This caused three near misses for us. In two cases it caused etcd to run out of space within a short timespan (A). In two cases (one case had both) it caused etcd to use so much memory that out masters went OOM (B).

We use trust-manager to inject three configmaps (full CA certs) into each namespace. This happened in fairly small clusters (<30 namespaces). You can reproduce it by simply restarting trust-manager a few times. It will recreate those configmaps, etcd will grow by 1-2 GB and memory on the master will rise by about 3-6 GB.

Personally, I would consider this a critical issue. Currently, we have to massively overprovision our masters. On Azure this prevents you from using the smallest Kubernetes tier at all as it will kill the API. On AWS we had to roughly double our costs.

jan-kantert avatar Jul 09 '24 13:07 jan-kantert

I agree this is a serious issue, but probably relatively easy to fix. Any watchers that would like to try a PR to fix this?

erikgb avatar Jul 09 '24 13:07 erikgb

We investigated this a bit more. It turns out that our issue is caused by non-deterministic ordering in label selectors. This is how our bundle looks like:

apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: xxx-default-ca
  namespace: my-ns
spec:
  sources:
  - useDefaultCAs: true
  - configMap:
      selector:
        matchLabels: 
          certs.infrastructure.mydomain.com/inject-default-ca: "true"
      key: "root-ca.pem"
  target:
    configMap:
      key: "root-certs.pem"
    additionalFormats:
      jks:
        key: "root-certs.jks"
      pkcs12:
        key: "root-certs.p12"

If we replace our selector with a static list this becomes less of an issue. With the selector the content seems to change every time.

I agree this is a serious issue, but probably relatively easy to fix. Any watchers that would like to try a PR to fix this?

I will have a look if I can reproduce this in a test.

jan-kantert avatar Jul 09 '24 15:07 jan-kantert

What caused that issue?

arsenalzp avatar Jul 12 '24 12:07 arsenalzp

I see two different issues here:

  • On one hand, @Jiawei0227 was requesting a feature where reordering the sources array in the Bundle spec would not result in an update to the configmaps.
  • On the other hand, @jan-kantert and @sebEg found a serious bug that forces the target configmaps to be updated for no reason and seems to only occur when label selectors are used. This bug still needs a minimal working example.

I suggest that we don't conflate the feature request with the bug. I propose that the current issue (#310) keeps track of @Jiawei0227's feature request. @jan-kantert @sebEg can you create a separate issue for the bug you found?

maelvls avatar Jul 12 '24 14:07 maelvls

I fixed both cases in in #380. Both issues have the same underlying cause. The second case just triggers the issue far more frequently since kubernetes will not order configmaps when loaded via a lebel selector. We could also sort those configmaps. That would fix the second issue (unless you rename configmaps). However, just ordering the certs will fix both issues at once. I can also add a test for the second case but it will be fixed as well.

jabdoa2 avatar Jul 12 '24 16:07 jabdoa2