cf-for-k8s icon indicating copy to clipboard operation
cf-for-k8s copied to clipboard

Installation script is not idempotent:

Open acosta11 opened this issue 5 years ago • 6 comments

Summary

Re-running the install script with the same input values after a successful installation should be a no-op update with no diff. However, when redeploying we currently see a diff on the resource serviceaccount/cc-kpack-registry-service-account (v1) namespace: cf-workloads-staging.

Reproduction Steps

K8s target: K8s 1.16.8-gke.3, provisioned via GKE on the rapid channel cf-for-k8s @ 2c8a31c7ca2f4aad968fb7e11ea16116789d1e9b or later

Run the installation as normal following the instructions from the deploy docs.

$ kapp deploy -a cf <( ytt -f config -f my-values.yml)
... (trimmed)
Succeeded

Install a second time (with line-by-line diff enabled)

$ kapp deploy -a cf -c --diff-context=-1 -f <(ytt -f config -f my-values.yml)
Target cluster 'https://<my k8s api>' (nodes: <my pool>, 4+)

--- update serviceaccount/cc-kpack-registry-service-account (v1) namespace: cf-workloads-staging
  0,  0   apiVersion: v1
  1,  1   kind: ServiceAccount
  2,  2   metadata:
  3,  3     annotations: {}
  4,  4     creationTimestamp: "2020-04-01T17:05:59Z"
  5,  5     labels:
  6,  6       kapp.k14s.io/app: "1585760667482283000"
  7,  7       kapp.k14s.io/association: v1.fb58b381e47d2efd39ddb2f7d03512db
  8,  8     name: cc-kpack-registry-service-account
  9,  9     namespace: cf-workloads-staging
 10, 10     resourceVersion: "54157"
 11, 11     selfLink: /api/v1/namespaces/cf-workloads-staging/serviceaccounts/cc-kpack-registry-service-account
 12, 12     uid: 400c5671-a37f-41fe-a2f1-cc806fea1771
 13, 13   secrets:
 14, 14   - name: cc-kpack-registry-auth-secret
 15     - - name: cc-kpack-registry-service-account-token-jvgb5
 16, 15

Changes

Namespace             Name                               Kind            Conds.  Age  Op      Wait to    Rs  Ri
cf-workloads-staging  cc-kpack-registry-service-account  ServiceAccount  -       11m  update  reconcile  ok  -

Op:      0 create, 0 delete, 1 update, 0 noop
Wait to: 1 reconcile, 0 delete, 0 noop

We expected to see no changes, but instead saw a diff on line 15, the removal of a service account token that was added by the Token Controller outside of kapp's knowledge and rebase rules configuration: 15 - - name: cc-kpack-registry-service-account-token-jvgb5 .

Related / Blocking Issue and Proposed Solution

https://github.com/k14s/kapp/issues/93 This particular diff is the result of TokenController adding an element to the Secrets array of a ServiceAccount and kapp doesn't have a mechanism by which to select a particular element of an arbitrary array structure when rebasing. When this functionality becomes available, we would add a rebase rule to accept the secret added by the Token Controller.

acosta11 avatar Apr 01 '20 17:04 acosta11

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/172109841

The labels on this github issue will be updated when the story is started.

cf-gitbot avatar Apr 01 '20 17:04 cf-gitbot

hello, i don't know if it describes exactly the same problem. when i try to run kapp a second time on an AKS within my pipeline i get the following error:

2020-08-11T11:58:20.6315497Z + kapp deploy -a cf -f /tmp/cf-for-k8s-rendered.yml -y 2020-08-11T11:58:20.7857924Z Target cluster 'https://xxxxxxxxx.xxx.westeurope.azmk8s.io:443' (nodes: aks-xxxxpool-15787200-vmss000002, 2+) [...] 2020-08-11T12:05:29.9278102Z 12:05:29PM: ^ Failed with reason BackoffLimitExceeded: Job has reached the specified backoff limit 2020-08-11T12:05:30.0363750Z 2020-08-11T12:05:30.0366524Z kapp: Error: waiting on reconcile job/ccdb-migrate (batch/v1) namespace: cf-system: 2020-08-11T12:05:30.0367406Z Finished unsuccessfully (Failed with reason BackoffLimitExceeded: Job has reached the specified backoff limit)

Debugging in the cluster:

_______________________________________________________________________________________
immi@NB170919IHA:~$ k get jobs -A_
NAMESPACE   NAME           COMPLETIONS   DURATION   AGE
cf-system   ccdb-migrate   0/1           18m        18m

_______________________________________________________________________________________
immi@NB170919IHA:~$ k describe job/ccdb-migrate -n cf-system
Name:           ccdb-migrate
Namespace:      cf-system
Selector:       controller-uid=529613af-d15f-436b-ac6a-b6aa71986046
Labels:         kapp.k14s.io/app=1597144180483705869
                kapp.k14s.io/association=v1.0c0034b280eaf52e1f449f050107b0a2
Annotations:    kapp.k14s.io/change-rule.cf-db-postgresql: upsert after upserting cf-for-k8s.cloudfoundry.org/cf-db-postgresql
                kapp.k14s.io/change-rule.istio-sidecar-injector: upsert after upserting cf-for-k8s.cloudfoundry.org/istio-sidecar-injector
                kapp.k14s.io/identity: v1;cf-system/batch/Job/ccdb-migrate;batch/v1
                kapp.k14s.io/original:
                  {"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{"kapp.k14s.io/change-rule.cf-db-postgresql":"upsert after upserting cf-fo...
                kapp.k14s.io/original-diff-md5: dcc4294c09ed95a83db47b69316e6358
                kapp.k14s.io/update-strategy: fallback-on-replace
Parallelism:    1
Completions:    1
Start Time:     Tue, 11 Aug 2020 13:58:28 +0200
Pods Statuses:  0 Running / 0 Succeeded / 1 Failed
Pod Template:
  Labels:  controller-uid=529613af-d15f-436b-ac6a-b6aa71986046
           job-name=ccdb-migrate
           kapp.k14s.io/app=1597144180483705869
           kapp.k14s.io/association=v1.0c0034b280eaf52e1f449f050107b0a2
  Containers:
   run-migrations:
    Image:      cloudfoundry/cloud-controller-ng@sha256:2c9a5f1d66163d7d3f376608d2343ccc1143de96194bb9a513f713e6d7bdbcee
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/bash
      -c
    Args:
      bundle exec rake db:wait_for_istio && \
      bundle exec rake db:setup_database && \
      bundle exec rake db:terminate_istio

    Environment:  <none>
    Mounts:
      /config from cloud-controller-ng-yaml (rw)
  Volumes:
   cloud-controller-ng-yaml:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cloud-controller-ng-yaml-ver-2
    Optional:  false
Events:
  Type     Reason                Age   From            Message
  ----     ------                ----  ----            -------
  Normal   SuccessfulCreate      18m   job-controller  Created pod: ccdb-migrate-8tj48
  Normal   SuccessfulDelete      11m   job-controller  Deleted pod: ccdb-migrate-8tj48
  Warning  BackoffLimitExceeded  11m   job-controller  Job has reached the specified backoff limit

___________________________________________________________________________________________________________
immi@NB170919IHA:~$ k logs pod/ccdb-migrate-8tj48 -n cf-system
Error from server (NotFound): pods "ccdb-migrate-8tj48" not found
immi@NB170919IHA:~$ k get po -n cf-system
NAME                                         READY   STATUS             RESTARTS   AGE
cf-api-clock-64d89957bc-wrfcx                1/2     CrashLoopBackOff   8          20m
cf-api-controllers-8685699457-7wclc          2/2     Running            1          20m
cf-api-deployment-updater-68bd744548-4fl6f   1/2     CrashLoopBackOff   8          20m
cf-api-server-58f6f5c69c-lsfm8               5/5     Running            2          65m
cf-api-server-7f4cbc8fb8-txtt9               2/5     CrashLoopBackOff   16         20m
cf-api-worker-6fbb57897d-ck8xx               1/2     CrashLoopBackOff   8          20m
eirini-58dcdc94b8-n4cpx                      2/2     Running            0          20m
eirini-controller-665c8d57c9-rzc5k           2/2     Running            1          20m
eirini-events-6fd6dc8bdf-5xnhx               2/2     Running            0          20m
eirini-task-reporter-df4d5684c-hqc4f         2/2     Running            0          20m
fluentd-cr29l                                2/2     Running            1          19m
fluentd-j8vkd                                2/2     Running            1          20m
fluentd-w5lv6                                2/2     Running            1          19m
log-cache-8c998f967-9lmwp                    5/5     Running            2          65m
metric-proxy-c998c7c5d-7ffmx                 2/2     Running            0          20m
routecontroller-757d9855d9-68dlc             2/2     Running            2          68m
uaa-557f56fbd6-gwkxf                         3/3     Running            2          68m
uaa-77f46dddf5-xfcjn                         2/3     CrashLoopBackOff   8          20m
________________________________________________________________________________________________________________________________________
immi@NB170919IHA:~$ k logs pod/uaa-77f46dddf5-xfcjn -n cf-system -c uaa

[...]
[CONTAINER] lina.core.ContainerBase.[Catalina].[localhost].[/] SEVERE  Servlet [spring] in web application [] threw load() exception
org.postgresql.util.PSQLException: FATAL: password authentication failed for user "uaa"
[...]

**immi@NB170919IHA:~$ k logs pod/cf-api-server-7f4cbc8fb8-txtt9 -n cf-system -c cf-api-server**
{"timestamp":"2020-08-11T12:20:48.217484428Z","message":"Encountered error: PG::ConnectionBad: FATAL:  password authentication failed for user \"cloud_controller\"\
[...]

_____________________________________________________________________________________________________________________________________
immi@NB170919IHA:~$ k logs pod/cf-api-server-7f4cbc8fb8-txtt9 -n cf-system -c cf-api-local-worker
rake aborted!
Sequel::DatabaseConnectionError: PG::ConnectionBad: FATAL:  password authentication failed for user "cloud_controller"
/cloud_controller_ng/lib/cloud_controller/db.rb:42:in `get_connection'
/cloud_controller_ng/lib/cloud_controller/db.rb:25:in `connect'
/cloud_controller_ng/lib/cloud_controller/db.rb:51:in `load_models'
/cloud_controller_ng/lib/cloud_controller/background_job_environment.rb:15:in `setup_environment'
/cloud_controller_ng/lib/tasks/jobs.rake:65:in `start_working'
/cloud_controller_ng/lib/tasks/jobs.rake:20:in `block (2 levels) in <top (required)>'
/usr/local/bin/bundle:23:in `load'
/usr/local/bin/bundle:23:in `<main>'

Caused by:
PG::ConnectionBad: FATAL:  password authentication failed for user "cloud_controller"
[...]

`

It seems that after the seccond deployment some value, passwords/ certs or whatever are broken? Or maybe because i rerun the bosh hack script within my pipeline the certs are different i guess....

FYI: even if i use kapp -delete -a cf before i deploy again i get errors. Let me know if you need more informations or debugging output. Cheers

immae1 avatar Aug 11 '20 12:08 immae1

It seems that after the second deployment some value, passwords/ certs or whatever are broken? Or maybe because i rerun the bosh hack script within my pipeline the certs are different

If your pipeline is regenerating the input values file like you said, then I would expect some known issues with cert/credential rotation to be the problem. See https://github.com/cloudfoundry/cf-for-k8s/issues/45 . (The error message makes me think your postgres password config is changing but the database deployment itself has not reflected that change.)

I would consider this a separate issue in that the inputs to the deploy are relevantly different so we aren't testing whether the kapp deploy operation is idempotent.

acosta11 avatar Aug 11 '20 18:08 acosta11

We are running into the same issue as this when trying to GitOp our deployments. We want to track what's changing between deployments but with the current setup, there is always something that is changed whether we make any changes or not.

braunsonm avatar Nov 16 '20 21:11 braunsonm

Here's a list of the resources that seem to change each time:

Namespace             Name                               Kind                            Conds.  Age  Op      Op st.  Wait to    Rs  Ri  
(cluster)             defaults.webhook.kpack.io          MutatingWebhookConfiguration    -       34d  update  -       reconcile  ok  -  
^                     istio-sidecar-injector             MutatingWebhookConfiguration    -       34d  update  -       reconcile  ok  -  
^                     istiod-istio-system                ValidatingWebhookConfiguration  -       34d  update  -       reconcile  ok  -  
^                     validation.webhook.kpack.io        ValidatingWebhookConfiguration  -       34d  update  -       reconcile  ok  -  
cf-workloads-staging  cc-kpack-registry-service-account  ServiceAccount                  -       34d  update  -       reconcile  ok  -  
kpack                 webhook-certs                      Secret                          -       34d  update  -       reconcile  ok  -  
@@ update serviceaccount/cc-kpack-registry-service-account (v1) namespace: cf-workloads-staging @@
  ...
  4,  4   metadata:
  5     -   annotations: {}
  6,  5     creationTimestamp: "2020-11-16T15:31:57Z"
  7,  6     labels:
  ...
 49, 48   - name: cc-kpack-registry-auth-secret-ver-2
 50     - - name: cc-kpack-registry-service-account-token-zsxft
 51, 49   
@@ update validatingwebhookconfiguration/istiod-istio-system (admissionregistration.k8s.io/v1beta1) cluster @@
  ...
  2,  2   metadata:
  3     -   annotations: {}
  4,  3     creationTimestamp: "2020-11-16T15:31:49Z"
  5,  4     generation: 288
  ...
 83, 82     clientConfig:
 84     -     caBundle: SNIP
     83 +     caBundle: ""
 85, 84       service:
 86, 85         name: istiod
  ...
 88, 87         path: /validate
 89     -       port: 443
 90     -   failurePolicy: Fail
 91     -   matchPolicy: Exact
     88 +   failurePolicy: Ignore
 92, 89     name: validation.istio.io
 93     -   namespaceSelector:
 94     -     matchExpressions:
 95     -     - key: control-plane
 96     -       operator: DoesNotExist
 97     -   objectSelector: {}
 98, 90     rules:
 99, 91     - apiGroups:
  ...
110,102       - '*'
111     -     scope: '*'
112,103     sideEffects: None
113     -   timeoutSeconds: 30
114,104   
@@ update mutatingwebhookconfiguration/istio-sidecar-injector (admissionregistration.k8s.io/v1beta1) cluster @@
  ...
 95, 95         path: /inject
 96     -       port: 443
 97, 96     failurePolicy: Fail
 98     -   matchPolicy: Exact
 99, 97     name: sidecar-injector.istio.io
100, 98     namespaceSelector:
101     -     matchExpressions:
102     -     - key: control-plane
103     -       operator: DoesNotExist
104, 99       matchLabels:
105,100         istio-injection: enabled
106     -   objectSelector: {}
107     -   reinvocationPolicy: Never
108,101     rules:
109,102     - apiGroups:
  ...
116,109       - pods
117     -     scope: '*'
118,110     sideEffects: None
119,111     timeoutSeconds: 30
@@ update mutatingwebhookconfiguration/defaults.webhook.kpack.io (admissionregistration.k8s.io/v1beta1) cluster @@
  ...
  2,  2   metadata:
  3     -   annotations: {}
      3 +   annotations:
      4 +     kapp.k14s.io/original: '{"apiVersion":"admissionregistration.k8s.io/v1beta1","kind":"MutatingWebhookConfiguration","metadata":{"labels":{"kapp.k14s.io/app":"1605540699896974037","kapp.k14s.io/association":"v1.84caf0a6517454a543ee30b896a4d599"},"name":"defaults.webhook.kpack.io"},"webhooks":[{"admissionReviewVersions":["v1beta1"],"clientConfig":{"service":{"name":"kpack-webhook","namespace":"kpack"}},"failurePolicy":"Fail","name":"defaults.webhook.kpack.io","sideEffects":"None"}]}'
      5 +     kapp.k14s.io/original-diff-md5: e053721e8727e53200a8cd368a0681d9
  4,  6     creationTimestamp: "2020-11-16T15:31:49Z"
  5,  7     generation: 21510612
@@ update validatingwebhookconfiguration/validation.webhook.kpack.io (admissionregistration.k8s.io/v1beta1) cluster @@
  ...
  2,  2   metadata:
  3     -   annotations: {}
  4,  3     creationTimestamp: "2020-11-16T15:31:49Z"
  5,  4     generation: 22009278
  ...
 79, 78         namespace: kpack
 80     -       path: /validate
 81     -       port: 443
 82, 79     failurePolicy: Fail
 83     -   matchPolicy: Exact
 84, 80     name: validation.webhook.kpack.io
 85     -   namespaceSelector:
 86     -     matchExpressions:
 87     -     - key: webhooks.knative.dev/exclude
 88     -       operator: DoesNotExist
 89     -     - key: control-plane
 90     -       operator: DoesNotExist
 91     -   objectSelector: {}
 92     -   rules:
 93     -   - apiGroups:
 94     -     - kpack.io
 95     -     apiVersions:
 96     -     - v1alpha1
 97     -     operations:
 98     -     - CREATE
 99     -     - UPDATE
100     -     - DELETE
101     -     resources:
102     -     - builders/*
103     -     scope: '*'
104     -   - apiGroups:
105     -     - kpack.io
106     -     apiVersions:
107     -     - v1alpha1
108     -     operations:
109     -     - CREATE
110     -     - UPDATE
111     -     - DELETE
112     -     resources:
113     -     - builds/*
114     -     scope: '*'
115     -   - apiGroups:
116     -     - kpack.io
117     -     apiVersions:
118     -     - v1alpha1
119     -     operations:
120     -     - CREATE
121     -     - UPDATE
122     -     - DELETE
123     -     resources:
124     -     - clusterbuilders/*
125     -     scope: '*'
126     -   - apiGroups:
127     -     - kpack.io
128     -     apiVersions:
129     -     - v1alpha1
130     -     operations:
131     -     - CREATE
132     -     - UPDATE
133     -     - DELETE
134     -     resources:
135     -     - clusterstacks/*
136     -     scope: '*'
137     -   - apiGroups:
138     -     - kpack.io
139     -     apiVersions:
140     -     - v1alpha1
141     -     operations:
142     -     - CREATE
143     -     - UPDATE
144     -     - DELETE
145     -     resources:
146     -     - clusterstores/*
147     -     scope: '*'
148     -   - apiGroups:
149     -     - kpack.io
150     -     apiVersions:
151     -     - v1alpha1
152     -     operations:
153     -     - CREATE
154     -     - UPDATE
155     -     - DELETE
156     -     resources:
157     -     - images/*
158     -     scope: '*'
159, 81     sideEffects: None
160     -   timeoutSeconds: 30
161, 82   
@@ update secret/webhook-certs (v1) namespace: kpack @@
  0,  0   apiVersion: v1
  1     - data:
  2     -   ca-cert.pem: <-- value not shown (#1)
  3     -   server-cert.pem: <-- value not shown (#2)
  4     -   server-key.pem: <-- value not shown (#3)
  5,  1   kind: Secret
  6,  2   metadata:
  7     -   annotations: {}
  8,  3     creationTimestamp: "2020-11-16T15:31:58Z"
  9,  4     labels:
  ...
 45, 40     uid: 7e38f5e9-9600-44d9-bd8b-f17d164b41f3
 46     - type: Opaque
 47, 41   

Additionally sometimes the deployment will fail as one of these resources will change while we are running the deployment.

braunsonm avatar Dec 21 '20 14:12 braunsonm

I'm interested in completing this however it is not possible until the following are fixed in kapp:

  • https://github.com/vmware-tanzu/carvel-kapp/issues/190
  • https://github.com/vmware-tanzu/carvel-kapp/issues/191

braunsonm avatar Feb 08 '21 14:02 braunsonm