troubleshoot icon indicating copy to clipboard operation
troubleshoot copied to clipboard

Allocatable preflight checks are not flexible enough to use

Open MikaelSmith opened this issue 5 years ago • 8 comments

The allocatable preflight checks seem useful to make sure that the cluster has enough resources we won't run into taints while installing the new application. However that only applies on new installs; on upgrade they may fail but it won't matter because the upgrade is just replacing existing resources.

MikaelSmith avatar May 29 '20 17:05 MikaelSmith

Hi @MikaelSmith, what if you could specify a matchlabel selector for the installed deployment, so the analyzer checks first if a certain deployment already exists? If it's not deployed then runs the check of allocatable resources, and if it is already deployed pass. Would that solve the issue?

manavellamnimble avatar Oct 23 '20 18:10 manavellamnimble

That seems like it could work.

To some extent resources are still needed to perform the upgrade: need to spin up a new replica before terminating the previous one. It seems like I could use that conditional and have two checks: one for enough space to perform upgrades, the other only for new installs.

MikaelSmith avatar Oct 23 '20 18:10 MikaelSmith

Thats sounds right. Im thinking something like this, using the keywords onInstall and onUpgrade together with the application.matchLabel field. The analyzer checks if the deployment with the specified label exists (e.g: app=myapp) and acts in consequence. @markpundsack @divolgin what are your thoughts about this?

- nodeResources:
        checkName: Check resources to install/upgrade the app.
        application:
          matchLabel:
            - app=myapp
        onInstall:
          filters:
            allocatableMemory: 16Gi
            cpuAllocatable: "6"
            selector:
                matchLabels: 
                kubernetes.io/role=app-primary-replica
          outcomes:
            - fail:
                when: "count() < 1"
                message: Must have 1 node with 16 GB (available) memory and 6 cores (on a single node) to install the application.
            - pass:
                message: This cluster has a node with enough memory and cpu to install the application. 
        onUpgrade:
          filters:
            allocatableMemory: 8Gi
            cpuAllocatable: "3"
          outcomes:
            - fail:
                when: "count() < 1"
                message: Must have 1 node with 8 GB (available) memory and 3 cores (on a single node) to Upgrade the app.
            - pass:
                message: This cluster has a node with enough memory and cpu capacity to upgrade your app.

manavellamnimble avatar Oct 23 '20 19:10 manavellamnimble

The basic idea is interesting, but I'm not following how the onInstall and onUpgrade would work in practice.

markpundsack avatar Oct 23 '20 21:10 markpundsack

@markpundsack first of all, preflight would check if the application matching the label in the application is installed. If it is not, then the filters and outcomes under onInstall would be applied. If the app already exists, that would mean it is an upgrade, and the filters and outcomes under onUpgrade would be applied.

Lets say you are deploying an app with 4 replicas. If it is a new install, then you must assure that the resources for the 4 replicas are available in the node. But if it is an upgrade, and you are updating 2 pods at a time, you need to check that there are enough resources just for 2 of them. As it works today, nodeResourceAnalyzer will keep checking for resources enough for 4 pods and throw an error, but if you can separate the requirements under onInstall and onUpgrade this is not a problem anymore.

manavellamnimble avatar Oct 23 '20 23:10 manavellamnimble

@MikaelSmith if we provided a template function in kots to exclude this analyzer during upgrades, would that solve this issue? This seems more like a kots problem and not really a troubleshoot problem. It would also allow you to define two analyzers: one for upgrades and one for initial installs if you wanted to (and not just for the NodeResource analyzer), which allows the same behavior as the one here.

divolgin avatar Oct 29 '20 22:10 divolgin

Yeah, that seems fine.

MikaelSmith avatar Oct 29 '20 22:10 MikaelSmith

Being able to tell between an install and upgrade would be useful for lots of things in Config too.

MikaelSmith avatar Oct 31 '20 04:10 MikaelSmith