kapp
kapp copied to clipboard
[ordering] don't delete before created successfully?
We face that Kapp deletes before ensuring the new version was successfully created. In case of an issue we end up with all resources deleted. It should at least keep the existing ones in case of an error.
A possible approach would be to create first, then delete later
Example where we ended up with a completely deleted application:
Namespace Name Kind Conds. Age Op Wait to Rs Ri
(cluster) redmine-Union-efs PersistentVolume - - create reconcile - -
redmine Union Deployment - - create reconcile - -
^ Union Ingress - - create reconcile - -
^ Union Service - - create reconcile - -
^ database-yml-Union-ver-1 ConfigMap - - create reconcile - -
^ database-yml-o11n-redmine-ver-1 ConfigMap - 5d delete delete ok -
^ database-yml-o11n-redmine-ver-2 ConfigMap - 5d delete delete ok -
^ database-yml-o11n-redmine-ver-3 ConfigMap - 2d delete delete ok -
^ o11n-redmine Deployment 2/2 t 6d delete delete ok -
^ o11n-redmine Ingress - 6d delete delete ok -
^ o11n-redmine Service - 6d delete delete ok -
^ redmine-Union-efs PersistentVolumeClaim - - create reconcile - -
Op: 6 create, 6 delete, 0 update, 0 noop
Wait to: 6 reconcile, 6 delete, 0 noop
8:57:53AM: ---- applying 12 changes [0/12 done] ----
8:57:53AM: create configmap/database-yml-Union-ver-1 (v1) namespace: redmine
8:57:53AM: delete configmap/database-yml-o11n-redmine-ver-1 (v1) namespace: redmine
8:57:53AM: delete configmap/database-yml-o11n-redmine-ver-2 (v1) namespace: redmine
8:57:53AM: delete configmap/database-yml-o11n-redmine-ver-3 (v1) namespace: redmine
8:57:53AM: create persistentvolume/redmine-Union-efs (v1) cluster
8:57:53AM: create persistentvolumeclaim/redmine-Union-efs (v1) namespace: redmine
8:57:53AM: create deployment/Union (apps/v1beta1) namespace: redmine
8:57:53AM: create ingress/Union (extensions/v1beta1) namespace: redmine
8:57:53AM: create service/Union (v1) namespace: redmine
8:57:53AM: delete service/o11n-redmine (v1) namespace: redmine
8:57:53AM: delete ingress/o11n-redmine (extensions/v1beta1) namespace: redmine
8:57:53AM: delete deployment/o11n-redmine (apps/v1beta1) namespace: redmine
kapp: Error: Applying create configmap/database-yml-Union-ver-1 (v1) namespace: redmine: Creating resource configmap/database-yml-Union-ver-1 (v1) namespace: redmine: ConfigMap "database-yml-Union-ver-1" is invalid: metadata.name: Invalid value: "database-yml-Union-ver-1": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*') (reason: Invalid)
🔥 error in kp.KappGitOpsDeploy exit status 1
kapp optimizes by default for as much parallelism as possible. since there was immediate relationship between resources being created and deleted kapp does not stagger them in any way. you can specify ordering constraints via change rules: https://github.com/k14s/kapp/blob/develop/docs/apply-ordering.md. does that solve your use case?
Thank you for the response! Yes I've been looking into the ordering possibilities already but I don't think they are applicable for this condition.
I believe that is either a bug or conceptional weakness what we see here, that Kapp does delete before creating!? Regardless of speed, if I get that right that a least for a certain amount of time the application is wiped away logically. So 1. delete, then 2. created again, correct?
Yes I've been looking into the ordering possibilities already but I don't think they are applicable for this condition.
can you explain why it's not applicable? people have used change rules to order things like you describe and vice versa.
a least for a certain amount of time the application is wiped away logically
sure, because there are no specified constraints to do otherwise. if you have two unrelated resources, why should kapp create, and then delete. what should happen if you have a business requirement to delete then create (for example if there are not enough resources on the cluster)?
can you explain why it's not applicable? people have used change rules to order things like you describe and vice versa.
change-rules are tying actions on specific resources in a sequence in relation to other resources. Our request is rather different, that deletes are done after creating all new resources. Hence I don't see a possibilities to tie each resource with itself on replacement or internal kapp configmap versioning for example.
what should happen if you have a business requirement to delete then create (for example if there are not enough resources on the cluster)?
So far haven't seen such tight resource situations as existant in k8s - it is a common situation that the cluster deals with such situations in automated fashion, using cluster autoscaler, resource evictions and so forth - fluctuating per design.
We like kapp so far because we believe it delivers higher confidence to our deployment, giving early feedback to the pipeline if the rollout should not succeed...
This behavior discovered sounds against the philosphy of this software as it actually questions the main concept of it providing a more resilient rollout process, wouldn't you agree @cppforlife
Hence I don't see a possibilities to tie each resource with itself on replacement
may be im getting confused here. when resource is changing (continue to have same name), it's not going thru a delete and create, it's just updated, so no ordering is needed with itself.
in your original issue description, you are deleting deployment o11n-redmine and creating entirely new deployment Union (two deployments are not tied together in any way). these changes, however, can be ordered by adding a rule upsert before deleting x where x is some group that o11n-redmine deployment (plus other resources like service, ingress, etc) was part of.
i could even imagine kapp adding a "meta" change group "to-be-deleted" as a configuration convenience...
This behavior discovered sounds against the philosphy of this software as it actually questions the main concept of it providing a more resilient rollout process
kapp tries to help you with deployment process by ensuring resources are converged (pruned, etc) based on given configuration. more advanced scenarios that require ordering are supported by currently require additional configuration (e.g. change rules).
im still trying to think where the middle ground is for this case since i do not want to "slow down" reconciliation process for most folks, but also want to cater to scenarios like you describe without too much additional configuration.
we are considering adding two meta change-groups: change-groups.kapp.k14s.io/to-be-upserted and change-groups.kapp.k14s.io/to-be-deleted. additionally we are working on adding ability to add change rules via config, so this combination would be able to provide with a way to delete after upserting. (still thinking on making that a default though...)