troubleshoot
troubleshoot copied to clipboard
conditional NodeResourceAnalyzer
@markpundsack @divolgin The idea is to make NodeResourceAnalyzer more flexible when working with allocatable resources. As stated in issue #210, in the case of updates the allocatable resources needed are not the same as in new installs. I added the possibility to check if a given deployment exists in a namespace, and to separate the resources needed whether it is a new install or an update. To this purpose I created three new fields, deployment, containing the name and namespace of the deployment, onInstall, where the filters and outcomes for new installs are provided, and onUpgrade, where in case the deployment already exists other filters and outcomes may be provided.
This does not changes the way the analyzer functioned originally. An example would be as follow:
- nodeResources:
checkName: check allocatable resources for new installs or updates
deployment:
name: myapp
namespace: default
onInstall:
filters:
cpuAllocatable: "5"
memoryAllocatable: 5Gi
outcomes:
- fail:
when: "count() < 1"
message: On new installs, this application requires at least 1 node with 5 allocatable cpus and 5Gb of allocatable memory.
uri: https://kurl.sh/docs/install-with-kurl/adding-nodes
- warn:
when: "count() < 2"
message: On new installs, this application requires at least 2 nodes with 5 allocatable cpus and 5Gb of allocatable memory.
uri: https://kurl.sh/docs/install-with-kurl/adding-nodes
- pass:
message: This cluster has enough nodes
onUpdate:
filters:
cpuAllocatable: "2"
memoryAllocatable: 2Gi
outcomes:
- fail:
when: "count() < 1"
message: On updates, this application requires at least 1 node with 2 allocatable cpus and 2Gb of allocatable memory.
uri: https://kurl.sh/docs/install-with-kurl/adding-nodes
- warn:
when: "count() < 2"
message: This application recommends at last 2 nodes with 2 allocatable cpus and 2Gb of allocatable memory.
uri: https://kurl.sh/docs/install-with-kurl/adding-nodes
- pass:
message: This cluster has enough nodes to update.
Fix #210
@markpundsack @manavellamnimble I agree with the problem identified here, but I'm not sure I agree with the solution.
Troubleshoot doesn't know about "install" vs "upgrade" workflows, that's a KOTS detail. How would this be used in an OSS workflow, outside of KOTS?
I think we need to go back to the design here a little. Label selectors feel like the right approach, but the "onInstall" and "onUpgrade" feel to wrap some magic up.
The goal here should be to somehow collect node resources without the podspecs that match the label selector (or only the ones that do).
I think the more "k8s native" design is to have a match label selector in the analyzer spec. We should collect the node resources specified, but subtract out the podspecs that match the matchlabel selector to determine the availability on the node without the pods that match.
I'm not sure what I'm proposing is exactly right. But I'd like to have more discussion before this change is merged in.
@marccampbell I thought something similar at first, using the spec.containers[].resources.requests[] field of the pods matching a certain label, but I wasn't sure if all the vendors reserved resources in this fashion. If you are ok with it, I will try this approach