troubleshoot
troubleshoot copied to clipboard
Enable analyzers that work on "available" resources: in other words those not already reserved
Describe the rationale for the suggested feature.
I'd like to be able to include logic around "available" resources on a node in when writing analyzers that deal with node resources. This will help me close in on whether my Kubernetes will be able to schedule my pod before I attempt my install, assuming I align my check with my resource requests.
Describe the feature
With this feature implemented, I'd be able to write a preflight that looks like this:
- nodeResources:
checkName: Are sufficient CPU resources available in the cluster
outcomes:
- fail:
when: "min(cpuAvailable) < 250m"
message: Your cluster currently has too few CPU resources available to install Gitea
- pass:
message: Your cluster has sufficient CPU resources available to install Gitea
- nodeResources:
checkName: Is sufficient memory available in the cluster
outcomes:
- fail:
when: "min(memoryAvailable) < 256Mi"
message: Your cluster currently has too little memory available to install Gitea
- pass:
message: Your cluster has sufficient memory available to install Gitea
and fail the install if my resource requests could not be fulfilled on any node (or any node that I've filtered into my analyzer).
kubectl describe node provides insight into these values, but they are not available as part of the status of the node so just getting the node doesn't show them.
Describe alternatives you've considered
Describe alternative solutions here. Include any workarounds you've considered.
Additional context
It seems like the best way to handle this is to collect all the resource requests for the pods running on the node and subtract that from the allocatable resources on that node. Based on the order of the kubectl describe node output I'd bet that's what it is doing, though I haven't read through the code to check.
I also asked ChatGPT to write me a kubectl plugin to calculate this to see what the code might look like, I'm attaching it for fun and reference.
kubectl-available-plugin.tar.gz
@chris-sanders, @diamonwiggins, and I chatted about this on Slack.
At first it seemed like this would be feasible to do within the analyzer, since clusterResources would contain everything needed. Unfortunately it looks like it has to happen in the clusterResources collector in case the pods collected are limited to certain namespaces.
It seems like the best thing to do would be to add the value to every node when collecting node info. My first thought was to put it only status, but that would break the type. Feels like putting it on an annotation would make sense in that case. Something like troubleshoot.sh/cpu-available and troubleshoot.sh/memory-available added to the nodes as after getting the item list.
Any thought on this design?