kapp
kapp copied to clipboard
safeguard from adoption of a resource by kapp
This morning kapp
created major havoc on the cluster when I was trying it out
Deployment command:
helmfile -e sandbox template | kapp -y deploy -a postgres -f -
helmfile
is a wrapper on top of help, and the helmfile -e sandbox template
generates definitions for four objects, all objects listed below except the first one. I am not sure why kapp
thinks it needs to update the postgres/PostgresVersion
resource.
Namespace Name Kind Conds. Age Op Wait to Rs Ri
(cluster) postgres PostgresVersion - 5m update reconcile ok -
pg2 postgres Postgres - - create reconcile - -
^ postgres-gcs-secret Secret - - create reconcile - -
^ postgres-init ConfigMap - - create reconcile - -
^ postgres-secret Secret - - create reconcile - -
Op: 4 create, 0 delete, 1 update, 0 noop
Wait to: 5 reconcile, 0 delete, 0 noop
I then ran
kapp delete -a postgres
and as you can see kapp
decided to delete the postgres/PostgresVersion
resource it didn't create
Namespace Name Kind Conds. Age Op Wait to Rs Ri
(cluster) postgres PostgresVersion - 13m delete delete ok -
pg2 postgres AppBinding - 7m - delete ok -
^ postgres Endpoints - 7m - delete ok -
^ postgres Postgres - 7m delete delete ok -
^ postgres Role - 7m - delete ok -
^ postgres RoleBinding - 7m - delete ok -
^ postgres Service - 7m - delete ok -
^ postgres ServiceAccount - 7m - delete ok -
^ postgres StatefulSet - 7m - delete ok -
^ postgres-gcs-secret Secret - 7m delete delete ok -
^ postgres-init ConfigMap - 7m delete delete ok -
^ postgres-replicas Endpoints - 7m - delete ok -
^ postgres-replicas Service - 7m - delete ok -
^ postgres-secret Secret - 7m delete delete ok -
^ postgres-snapshot Role - 7m - delete ok -
^ postgres-snapshot RoleBinding - 7m - delete ok -
^ postgres-snapshot ServiceAccount - 7m - delete ok -
Op: 0 create, 5 delete, 0 update, 12 noop
Wait to: 0 reconcile, 17 delete, 0 noop
I didn't spot it right away, my fault, but needless to say, I don't think kapp
should be deleting the (cluster-wide) resource it didn't create.
Note: the postgres
resources in the chart are managed by [kubedb operator](for https://kubedb.com/docs/0.12.0/welcome/)
I had to re-install the kubedb
operator to fix the cluster operations.
I was really excited about kapp
, but the above is a major issue, which obviously prevents us from using the tool. I am just curious as to why kapp
behaves this way.
This very well might be an issue with kubedb
, but a lack of safeguard in kapp
for deleting something it didn't create still worries me.
This morning kapp created major havoc on the cluster when I was trying it out
that's unfortunate. let's see what happened...
helmfile -e sandbox template | kapp -y deploy -a postgres -f - generates definitions for four objects, all objects listed below except the first one.
based on your first snippet kapp was updating (cluster) postgres PostgresVersion
resource. given this information kapp did receive that resource from the helmfile -e sandbox template
output. can we snippet what is helmfile returning?
one way to have more control of kapp (especially during experimentation) when piping output is to use process substitution, this way kapp will prompt you whether you want to apply changes:
kapp deploy -a postgres -f <(helmfile -e sandbox template)
you can also throw in -c
(or longer --diff-changes
) to see full diff.
I don't think kapp should be deleting the (cluster-wide) resource it didn't create.
because kapp received that resource in its configuration, it began to own that resource (because it made updates to it). since it was "owned" by kapp, when you asked to delete that application it deleted all resources that were owned by it, including that one.
but a lack of safeguard in kapp for deleting something it didn't create still worries me.
one safeguard that we did built into kapp is to present action summary before executing it. in this case that did not quite help you. i could see adding additional safeguards but im still not clear why helmfile output included resource you did not expect. should that have been highlighted in a more prominent manner, but even then since you were running it with -y you wouldnt have opportunity to do anything about that.
to me the "problem" happened earlier -- at the time of the initial deploy, when resource was updated -- e.g. what should we do with resources that we need to update and are already found in cluster? kapp, today, assumes that it will take ownership of them, which is typically desired.
given this information kapp did receive that resource from the
helmfile -e sandbox template
output. . can we snippet what is helmfile returning?
Yes, you are right, I rushed through the original analysis, my apologies. Thank you for being super patient and helpful. Below is the scrubbed version of the helmfile -e sandbox template
output:
---
# Source: postgres/templates/db_secrets.yaml
apiVersion: v1
kind: Secret
metadata:
name: postgres-secret
labels:
app.kubernetes.io/name: postgres
helm.sh/chart: postgres-0.6.0
app.kubernetes.io/instance: postgres
app.kubernetes.io/managed-by: Helm
type: Opaque
data:
POSTGRES_PASSWORD: "<scrubbed>"
---
# Source: postgres/templates/gcs_secrets.yaml
apiVersion: v1
kind: Secret
metadata:
name: postgres-gcs-secret
labels:
app.kubernetes.io/name: postgres
helm.sh/chart: postgres-0.6.0
app.kubernetes.io/instance: postgres
app.kubernetes.io/managed-by: Helm
type: Opaque
data:
GOOGLE_PROJECT_ID: "<scrubbed>"
GOOGLE_SERVICE_ACCOUNT_JSON_KEY: "<scrubbed>"
---
# Source: postgres/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-init
labels:
app.kubernetes.io/name: postgres
helm.sh/chart: postgres-0.6.0
app.kubernetes.io/instance: postgres
app.kubernetes.io/managed-by: Helm
binaryData:
init.sql: "<scrubbed>"
---
# Source: postgres/templates/postgres.yaml
apiVersion: kubedb.com/v1alpha1
kind: Postgres
metadata:
name: postgres
labels:
app.kubernetes.io/name: postgres
helm.sh/chart: postgres-0.6.0
app.kubernetes.io/instance: postgres
app.kubernetes.io/managed-by: Helm
spec:
replicas: 1
version: postgres
databaseSecret:
secretName: postgres-secret
storageType: Durable
storage:
storageClassName: "standard"
standbyMode: Warm
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
terminationPolicy: WipeOut
archiver:
storage:
storageSecretName: postgres-gcs-secret
gcs:
bucket: <scrubbed>
prefix: pg2
init:
scriptSource:
configMap:
name: postgres-init
---
# Source: postgres/templates/postgres_version.yaml
apiVersion: catalog.kubedb.com/v1alpha1
kind: PostgresVersion
metadata:
name: postgres
labels:
app.kubernetes.io/name: postgres
helm.sh/chart: postgres-0.6.0
app.kubernetes.io/instance: postgres
app.kubernetes.io/managed-by: Helm
annotations:
"helm.sh/hook": pre-install
"helm.sh/hook-delete-policy": "before-hook-creation"
spec:
version: 9.6-v3
podSecurityPolicies:
databasePolicyName: "postgres-db"
snapshotterPolicyName: "postgres-snapshot"
db:
image: "<scrubbed>"
exporter:
image: "kubedb/postgres_exporter:v0.4.6"
tools:
image: "kubedb/postgres-tools:9.6-v3"
The interesting part is that the PostgresVersion
object has Helm hook annotations, which explains why Helm doesn't delete it. Does kapp by any chance has a mechanism that emulates Helm behavior (as bad as it can be)? Is there an annotation that would make kapp delete
skip resource deletion?
Thank you for being super patient and helpful.
here to help :)
Does kapp by any chance has a mechanism that emulates Helm behavior (as bad as it can be)? Is there an annotation that would make kapp delete skip resource deletion?
there is a kapp.k14s.io/delete-strategy
annotation with "orphan" value (https://github.com/k14s/kapp/blob/develop/docs/apply.md#controlling-apply-via-resource-annotations); however, next time you deploy with kapp and that resource is defined, kapp will complain that resource was already "owned" by a different (deleted) app. arguably we should add another strategy value like "orphan-and-disassociate". (there is a way to get around that but i discourage its use: https://github.com/k14s/kapp/blob/develop/docs/dangerous-flags.md#--dangerous-override-ownership-of-existing-resources)
it seems that this resource is more of a "global resource" (that defines available postgres versions) than a resource that is related to this particular postgres app. could you separate it out into a different kapp deploy? you could do it via builtin filtering or via tools like get-ytt.io.
# deploy shared stuff
kapp deploy -a shared-pg -f <(...) --filter-kind PostgresVersion
# deploy an instance of postgres
kapp deploy -a pg1 -f <(...) --filter '{"not":{"resource":{"kinds":["PostgresVersion"]}}}'
Thank you for the additional info.
it seems that this resource is more of a "global resource" (that defines available postgres versions) than a resource that is related to this particular postgres app
Yes, you are right. The original helm chart is doing something very questionable. A chart that is deployed into a namespace should not be overriding a global (cluster-wide) object. I picked that particular chart to experiment with kapp because we are having an issue deleting the Kubernetes objects deployed by the chart: the objects get deleted in the wrong order, which causes the kubedb finalizer to get stuck. I thought that chart would be a good one to try out kapp ordering capabilities. I wasn't the original developer for the chart, and I missed the chart's questionable behavior of updating the global object. Thank you for the proposed workaround though.
A chart that is deployed into a namespace should not be overriding a global (cluster-wide) object
well. that depends. i dont personally see anything wrong with cluster level objects being in an "app"; however, kapp follows a philosophy that only one app owns a particular objects. that leads to clearer ownership semantics without getting into reference counting style tracking.
I wasn't the original developer for the chart, and I missed the chart's questionable behavior of updating the global object. Thank you for the proposed workaround though.
understood. btw here is another alternative that i personally prefer: use get-ytt.io to apply an overlay that makes PostgresVersion cluster resource specific to a particular application. if you decide to deploy multiple of those things you can give PostgresVersion a different name.
kapp -y deploy -a postgres -f <(helmfile -e sandbox template | ytt --ignore-unknown-comments -f- -f fix-version.yml)
fix-version.yml
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind":"PostgresVersion"})
---
metadata:
name: app1-postgres-version
#@overlay/match by=overlay.subset({"kind":"Postgres"})
---
spec:
version: app1-postgres-version
PostgresVersion
belongs to another chart, a third-party kubedb-catalog chart. That is why I am saying that updating PostgresVersion
from our custom chart is wrong: we are breaking the single-owner rule. The scopes (cluster vs namespace) and the volatilities (when the two charts get installed/updated/removed) do not match either, so our chart has no business updating PostgresVersion
.
If we need to update the docker image referenced by the PostgresVersion
manifest (this is the whole reason we are updating it), we should update the kubedb-catalog chart.
I am using Helm terminology but obviously the same can be done using two separate kapps apps.
That is why I am saying that updating PostgresVersion from our custom chart is wrong
ah, this makes sense to me now in how you it's setup. i didnt realize multiple charts were involved here.