Installation script is not idempotent:
Summary
Re-running the install script with the same input values after a successful installation should be a no-op update with no diff. However, when redeploying we currently see a diff on the resource serviceaccount/cc-kpack-registry-service-account (v1) namespace: cf-workloads-staging.
Reproduction Steps
K8s target: K8s 1.16.8-gke.3, provisioned via GKE on the rapid channel cf-for-k8s @ 2c8a31c7ca2f4aad968fb7e11ea16116789d1e9b or later
Run the installation as normal following the instructions from the deploy docs.
$ kapp deploy -a cf <( ytt -f config -f my-values.yml)
... (trimmed)
Succeeded
Install a second time (with line-by-line diff enabled)
$ kapp deploy -a cf -c --diff-context=-1 -f <(ytt -f config -f my-values.yml)
Target cluster 'https://<my k8s api>' (nodes: <my pool>, 4+)
--- update serviceaccount/cc-kpack-registry-service-account (v1) namespace: cf-workloads-staging
0, 0 apiVersion: v1
1, 1 kind: ServiceAccount
2, 2 metadata:
3, 3 annotations: {}
4, 4 creationTimestamp: "2020-04-01T17:05:59Z"
5, 5 labels:
6, 6 kapp.k14s.io/app: "1585760667482283000"
7, 7 kapp.k14s.io/association: v1.fb58b381e47d2efd39ddb2f7d03512db
8, 8 name: cc-kpack-registry-service-account
9, 9 namespace: cf-workloads-staging
10, 10 resourceVersion: "54157"
11, 11 selfLink: /api/v1/namespaces/cf-workloads-staging/serviceaccounts/cc-kpack-registry-service-account
12, 12 uid: 400c5671-a37f-41fe-a2f1-cc806fea1771
13, 13 secrets:
14, 14 - name: cc-kpack-registry-auth-secret
15 - - name: cc-kpack-registry-service-account-token-jvgb5
16, 15
Changes
Namespace Name Kind Conds. Age Op Wait to Rs Ri
cf-workloads-staging cc-kpack-registry-service-account ServiceAccount - 11m update reconcile ok -
Op: 0 create, 0 delete, 1 update, 0 noop
Wait to: 1 reconcile, 0 delete, 0 noop
We expected to see no changes, but instead saw a diff on line 15, the removal of a service account token that was added by the Token Controller outside of kapp's knowledge and rebase rules configuration: 15 - - name: cc-kpack-registry-service-account-token-jvgb5 .
Related / Blocking Issue and Proposed Solution
https://github.com/k14s/kapp/issues/93 This particular diff is the result of TokenController adding an element to the Secrets array of a ServiceAccount and kapp doesn't have a mechanism by which to select a particular element of an arbitrary array structure when rebasing. When this functionality becomes available, we would add a rebase rule to accept the secret added by the Token Controller.
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/172109841
The labels on this github issue will be updated when the story is started.
hello, i don't know if it describes exactly the same problem. when i try to run kapp a second time on an AKS within my pipeline i get the following error:
2020-08-11T11:58:20.6315497Z + kapp deploy -a cf -f /tmp/cf-for-k8s-rendered.yml -y 2020-08-11T11:58:20.7857924Z Target cluster 'https://xxxxxxxxx.xxx.westeurope.azmk8s.io:443' (nodes: aks-xxxxpool-15787200-vmss000002, 2+) [...] 2020-08-11T12:05:29.9278102Z 12:05:29PM: ^ Failed with reason BackoffLimitExceeded: Job has reached the specified backoff limit 2020-08-11T12:05:30.0363750Z 2020-08-11T12:05:30.0366524Z kapp: Error: waiting on reconcile job/ccdb-migrate (batch/v1) namespace: cf-system: 2020-08-11T12:05:30.0367406Z Finished unsuccessfully (Failed with reason BackoffLimitExceeded: Job has reached the specified backoff limit)
Debugging in the cluster:
_______________________________________________________________________________________
immi@NB170919IHA:~$ k get jobs -A_
NAMESPACE NAME COMPLETIONS DURATION AGE
cf-system ccdb-migrate 0/1 18m 18m
_______________________________________________________________________________________
immi@NB170919IHA:~$ k describe job/ccdb-migrate -n cf-system
Name: ccdb-migrate
Namespace: cf-system
Selector: controller-uid=529613af-d15f-436b-ac6a-b6aa71986046
Labels: kapp.k14s.io/app=1597144180483705869
kapp.k14s.io/association=v1.0c0034b280eaf52e1f449f050107b0a2
Annotations: kapp.k14s.io/change-rule.cf-db-postgresql: upsert after upserting cf-for-k8s.cloudfoundry.org/cf-db-postgresql
kapp.k14s.io/change-rule.istio-sidecar-injector: upsert after upserting cf-for-k8s.cloudfoundry.org/istio-sidecar-injector
kapp.k14s.io/identity: v1;cf-system/batch/Job/ccdb-migrate;batch/v1
kapp.k14s.io/original:
{"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{"kapp.k14s.io/change-rule.cf-db-postgresql":"upsert after upserting cf-fo...
kapp.k14s.io/original-diff-md5: dcc4294c09ed95a83db47b69316e6358
kapp.k14s.io/update-strategy: fallback-on-replace
Parallelism: 1
Completions: 1
Start Time: Tue, 11 Aug 2020 13:58:28 +0200
Pods Statuses: 0 Running / 0 Succeeded / 1 Failed
Pod Template:
Labels: controller-uid=529613af-d15f-436b-ac6a-b6aa71986046
job-name=ccdb-migrate
kapp.k14s.io/app=1597144180483705869
kapp.k14s.io/association=v1.0c0034b280eaf52e1f449f050107b0a2
Containers:
run-migrations:
Image: cloudfoundry/cloud-controller-ng@sha256:2c9a5f1d66163d7d3f376608d2343ccc1143de96194bb9a513f713e6d7bdbcee
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
Args:
bundle exec rake db:wait_for_istio && \
bundle exec rake db:setup_database && \
bundle exec rake db:terminate_istio
Environment: <none>
Mounts:
/config from cloud-controller-ng-yaml (rw)
Volumes:
cloud-controller-ng-yaml:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cloud-controller-ng-yaml-ver-2
Optional: false
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 18m job-controller Created pod: ccdb-migrate-8tj48
Normal SuccessfulDelete 11m job-controller Deleted pod: ccdb-migrate-8tj48
Warning BackoffLimitExceeded 11m job-controller Job has reached the specified backoff limit
___________________________________________________________________________________________________________
immi@NB170919IHA:~$ k logs pod/ccdb-migrate-8tj48 -n cf-system
Error from server (NotFound): pods "ccdb-migrate-8tj48" not found
immi@NB170919IHA:~$ k get po -n cf-system
NAME READY STATUS RESTARTS AGE
cf-api-clock-64d89957bc-wrfcx 1/2 CrashLoopBackOff 8 20m
cf-api-controllers-8685699457-7wclc 2/2 Running 1 20m
cf-api-deployment-updater-68bd744548-4fl6f 1/2 CrashLoopBackOff 8 20m
cf-api-server-58f6f5c69c-lsfm8 5/5 Running 2 65m
cf-api-server-7f4cbc8fb8-txtt9 2/5 CrashLoopBackOff 16 20m
cf-api-worker-6fbb57897d-ck8xx 1/2 CrashLoopBackOff 8 20m
eirini-58dcdc94b8-n4cpx 2/2 Running 0 20m
eirini-controller-665c8d57c9-rzc5k 2/2 Running 1 20m
eirini-events-6fd6dc8bdf-5xnhx 2/2 Running 0 20m
eirini-task-reporter-df4d5684c-hqc4f 2/2 Running 0 20m
fluentd-cr29l 2/2 Running 1 19m
fluentd-j8vkd 2/2 Running 1 20m
fluentd-w5lv6 2/2 Running 1 19m
log-cache-8c998f967-9lmwp 5/5 Running 2 65m
metric-proxy-c998c7c5d-7ffmx 2/2 Running 0 20m
routecontroller-757d9855d9-68dlc 2/2 Running 2 68m
uaa-557f56fbd6-gwkxf 3/3 Running 2 68m
uaa-77f46dddf5-xfcjn 2/3 CrashLoopBackOff 8 20m
________________________________________________________________________________________________________________________________________
immi@NB170919IHA:~$ k logs pod/uaa-77f46dddf5-xfcjn -n cf-system -c uaa
[...]
[CONTAINER] lina.core.ContainerBase.[Catalina].[localhost].[/] SEVERE Servlet [spring] in web application [] threw load() exception
org.postgresql.util.PSQLException: FATAL: password authentication failed for user "uaa"
[...]
**immi@NB170919IHA:~$ k logs pod/cf-api-server-7f4cbc8fb8-txtt9 -n cf-system -c cf-api-server**
{"timestamp":"2020-08-11T12:20:48.217484428Z","message":"Encountered error: PG::ConnectionBad: FATAL: password authentication failed for user \"cloud_controller\"\
[...]
_____________________________________________________________________________________________________________________________________
immi@NB170919IHA:~$ k logs pod/cf-api-server-7f4cbc8fb8-txtt9 -n cf-system -c cf-api-local-worker
rake aborted!
Sequel::DatabaseConnectionError: PG::ConnectionBad: FATAL: password authentication failed for user "cloud_controller"
/cloud_controller_ng/lib/cloud_controller/db.rb:42:in `get_connection'
/cloud_controller_ng/lib/cloud_controller/db.rb:25:in `connect'
/cloud_controller_ng/lib/cloud_controller/db.rb:51:in `load_models'
/cloud_controller_ng/lib/cloud_controller/background_job_environment.rb:15:in `setup_environment'
/cloud_controller_ng/lib/tasks/jobs.rake:65:in `start_working'
/cloud_controller_ng/lib/tasks/jobs.rake:20:in `block (2 levels) in <top (required)>'
/usr/local/bin/bundle:23:in `load'
/usr/local/bin/bundle:23:in `<main>'
Caused by:
PG::ConnectionBad: FATAL: password authentication failed for user "cloud_controller"
[...]
`
It seems that after the seccond deployment some value, passwords/ certs or whatever are broken? Or maybe because i rerun the bosh hack script within my pipeline the certs are different i guess....
FYI: even if i use kapp -delete -a cf before i deploy again i get errors. Let me know if you need more informations or debugging output. Cheers
It seems that after the second deployment some value, passwords/ certs or whatever are broken? Or maybe because i rerun the bosh hack script within my pipeline the certs are different
If your pipeline is regenerating the input values file like you said, then I would expect some known issues with cert/credential rotation to be the problem. See https://github.com/cloudfoundry/cf-for-k8s/issues/45 . (The error message makes me think your postgres password config is changing but the database deployment itself has not reflected that change.)
I would consider this a separate issue in that the inputs to the deploy are relevantly different so we aren't testing whether the kapp deploy operation is idempotent.
We are running into the same issue as this when trying to GitOp our deployments. We want to track what's changing between deployments but with the current setup, there is always something that is changed whether we make any changes or not.
Here's a list of the resources that seem to change each time:
Namespace Name Kind Conds. Age Op Op st. Wait to Rs Ri
(cluster) defaults.webhook.kpack.io MutatingWebhookConfiguration - 34d update - reconcile ok -
^ istio-sidecar-injector MutatingWebhookConfiguration - 34d update - reconcile ok -
^ istiod-istio-system ValidatingWebhookConfiguration - 34d update - reconcile ok -
^ validation.webhook.kpack.io ValidatingWebhookConfiguration - 34d update - reconcile ok -
cf-workloads-staging cc-kpack-registry-service-account ServiceAccount - 34d update - reconcile ok -
kpack webhook-certs Secret - 34d update - reconcile ok -
@@ update serviceaccount/cc-kpack-registry-service-account (v1) namespace: cf-workloads-staging @@
...
4, 4 metadata:
5 - annotations: {}
6, 5 creationTimestamp: "2020-11-16T15:31:57Z"
7, 6 labels:
...
49, 48 - name: cc-kpack-registry-auth-secret-ver-2
50 - - name: cc-kpack-registry-service-account-token-zsxft
51, 49
@@ update validatingwebhookconfiguration/istiod-istio-system (admissionregistration.k8s.io/v1beta1) cluster @@
...
2, 2 metadata:
3 - annotations: {}
4, 3 creationTimestamp: "2020-11-16T15:31:49Z"
5, 4 generation: 288
...
83, 82 clientConfig:
84 - caBundle: SNIP
83 + caBundle: ""
85, 84 service:
86, 85 name: istiod
...
88, 87 path: /validate
89 - port: 443
90 - failurePolicy: Fail
91 - matchPolicy: Exact
88 + failurePolicy: Ignore
92, 89 name: validation.istio.io
93 - namespaceSelector:
94 - matchExpressions:
95 - - key: control-plane
96 - operator: DoesNotExist
97 - objectSelector: {}
98, 90 rules:
99, 91 - apiGroups:
...
110,102 - '*'
111 - scope: '*'
112,103 sideEffects: None
113 - timeoutSeconds: 30
114,104
@@ update mutatingwebhookconfiguration/istio-sidecar-injector (admissionregistration.k8s.io/v1beta1) cluster @@
...
95, 95 path: /inject
96 - port: 443
97, 96 failurePolicy: Fail
98 - matchPolicy: Exact
99, 97 name: sidecar-injector.istio.io
100, 98 namespaceSelector:
101 - matchExpressions:
102 - - key: control-plane
103 - operator: DoesNotExist
104, 99 matchLabels:
105,100 istio-injection: enabled
106 - objectSelector: {}
107 - reinvocationPolicy: Never
108,101 rules:
109,102 - apiGroups:
...
116,109 - pods
117 - scope: '*'
118,110 sideEffects: None
119,111 timeoutSeconds: 30
@@ update mutatingwebhookconfiguration/defaults.webhook.kpack.io (admissionregistration.k8s.io/v1beta1) cluster @@
...
2, 2 metadata:
3 - annotations: {}
3 + annotations:
4 + kapp.k14s.io/original: '{"apiVersion":"admissionregistration.k8s.io/v1beta1","kind":"MutatingWebhookConfiguration","metadata":{"labels":{"kapp.k14s.io/app":"1605540699896974037","kapp.k14s.io/association":"v1.84caf0a6517454a543ee30b896a4d599"},"name":"defaults.webhook.kpack.io"},"webhooks":[{"admissionReviewVersions":["v1beta1"],"clientConfig":{"service":{"name":"kpack-webhook","namespace":"kpack"}},"failurePolicy":"Fail","name":"defaults.webhook.kpack.io","sideEffects":"None"}]}'
5 + kapp.k14s.io/original-diff-md5: e053721e8727e53200a8cd368a0681d9
4, 6 creationTimestamp: "2020-11-16T15:31:49Z"
5, 7 generation: 21510612
@@ update validatingwebhookconfiguration/validation.webhook.kpack.io (admissionregistration.k8s.io/v1beta1) cluster @@
...
2, 2 metadata:
3 - annotations: {}
4, 3 creationTimestamp: "2020-11-16T15:31:49Z"
5, 4 generation: 22009278
...
79, 78 namespace: kpack
80 - path: /validate
81 - port: 443
82, 79 failurePolicy: Fail
83 - matchPolicy: Exact
84, 80 name: validation.webhook.kpack.io
85 - namespaceSelector:
86 - matchExpressions:
87 - - key: webhooks.knative.dev/exclude
88 - operator: DoesNotExist
89 - - key: control-plane
90 - operator: DoesNotExist
91 - objectSelector: {}
92 - rules:
93 - - apiGroups:
94 - - kpack.io
95 - apiVersions:
96 - - v1alpha1
97 - operations:
98 - - CREATE
99 - - UPDATE
100 - - DELETE
101 - resources:
102 - - builders/*
103 - scope: '*'
104 - - apiGroups:
105 - - kpack.io
106 - apiVersions:
107 - - v1alpha1
108 - operations:
109 - - CREATE
110 - - UPDATE
111 - - DELETE
112 - resources:
113 - - builds/*
114 - scope: '*'
115 - - apiGroups:
116 - - kpack.io
117 - apiVersions:
118 - - v1alpha1
119 - operations:
120 - - CREATE
121 - - UPDATE
122 - - DELETE
123 - resources:
124 - - clusterbuilders/*
125 - scope: '*'
126 - - apiGroups:
127 - - kpack.io
128 - apiVersions:
129 - - v1alpha1
130 - operations:
131 - - CREATE
132 - - UPDATE
133 - - DELETE
134 - resources:
135 - - clusterstacks/*
136 - scope: '*'
137 - - apiGroups:
138 - - kpack.io
139 - apiVersions:
140 - - v1alpha1
141 - operations:
142 - - CREATE
143 - - UPDATE
144 - - DELETE
145 - resources:
146 - - clusterstores/*
147 - scope: '*'
148 - - apiGroups:
149 - - kpack.io
150 - apiVersions:
151 - - v1alpha1
152 - operations:
153 - - CREATE
154 - - UPDATE
155 - - DELETE
156 - resources:
157 - - images/*
158 - scope: '*'
159, 81 sideEffects: None
160 - timeoutSeconds: 30
161, 82
@@ update secret/webhook-certs (v1) namespace: kpack @@
0, 0 apiVersion: v1
1 - data:
2 - ca-cert.pem: <-- value not shown (#1)
3 - server-cert.pem: <-- value not shown (#2)
4 - server-key.pem: <-- value not shown (#3)
5, 1 kind: Secret
6, 2 metadata:
7 - annotations: {}
8, 3 creationTimestamp: "2020-11-16T15:31:58Z"
9, 4 labels:
...
45, 40 uid: 7e38f5e9-9600-44d9-bd8b-f17d164b41f3
46 - type: Opaque
47, 41
Additionally sometimes the deployment will fail as one of these resources will change while we are running the deployment.
I'm interested in completing this however it is not possible until the following are fixed in kapp:
- https://github.com/vmware-tanzu/carvel-kapp/issues/190
- https://github.com/vmware-tanzu/carvel-kapp/issues/191