troubleshoot
troubleshoot copied to clipboard
Allocatable preflight checks are not flexible enough to use
The allocatable preflight checks seem useful to make sure that the cluster has enough resources we won't run into taints while installing the new application. However that only applies on new installs; on upgrade they may fail but it won't matter because the upgrade is just replacing existing resources.
Hi @MikaelSmith, what if you could specify a matchlabel selector for the installed deployment, so the analyzer checks first if a certain deployment already exists? If it's not deployed then runs the check of allocatable resources, and if it is already deployed pass. Would that solve the issue?
That seems like it could work.
To some extent resources are still needed to perform the upgrade: need to spin up a new replica before terminating the previous one. It seems like I could use that conditional and have two checks: one for enough space to perform upgrades, the other only for new installs.
Thats sounds right. Im thinking something like this, using the keywords onInstall and onUpgrade together with the application.matchLabel field. The analyzer checks if the deployment with the specified label exists (e.g: app=myapp) and acts in consequence. @markpundsack @divolgin what are your thoughts about this?
- nodeResources:
checkName: Check resources to install/upgrade the app.
application:
matchLabel:
- app=myapp
onInstall:
filters:
allocatableMemory: 16Gi
cpuAllocatable: "6"
selector:
matchLabels:
kubernetes.io/role=app-primary-replica
outcomes:
- fail:
when: "count() < 1"
message: Must have 1 node with 16 GB (available) memory and 6 cores (on a single node) to install the application.
- pass:
message: This cluster has a node with enough memory and cpu to install the application.
onUpgrade:
filters:
allocatableMemory: 8Gi
cpuAllocatable: "3"
outcomes:
- fail:
when: "count() < 1"
message: Must have 1 node with 8 GB (available) memory and 3 cores (on a single node) to Upgrade the app.
- pass:
message: This cluster has a node with enough memory and cpu capacity to upgrade your app.
The basic idea is interesting, but I'm not following how the onInstall and onUpgrade would work in practice.
@markpundsack first of all, preflight would check if the application matching the label in the application is installed. If it is not, then the filters and outcomes under onInstall would be applied. If the app already exists, that would mean it is an upgrade, and the filters and outcomes under onUpgrade would be applied.
Lets say you are deploying an app with 4 replicas. If it is a new install, then you must assure that the resources for the 4 replicas are available in the node. But if it is an upgrade, and you are updating 2 pods at a time, you need to check that there are enough resources just for 2 of them. As it works today, nodeResourceAnalyzer will keep checking for resources enough for 4 pods and throw an error, but if you can separate the requirements under onInstall and onUpgrade this is not a problem anymore.
@MikaelSmith if we provided a template function in kots to exclude this analyzer during upgrades, would that solve this issue? This seems more like a kots problem and not really a troubleshoot problem. It would also allow you to define two analyzers: one for upgrades and one for initial installs if you wanted to (and not just for the NodeResource analyzer), which allows the same behavior as the one here.
Yeah, that seems fine.
Being able to tell between an install and upgrade would be useful for lots of things in Config too.