tilt icon indicating copy to clipboard operation
tilt copied to clipboard

Arbitrary k8s readiness checks

Open maiamcc opened this issue 5 years ago • 6 comments

From @jfancher:

is there any workaround for, or plan to support, depending on readiness of arbitrary k8s objects? For example... I have a KafkaTopic CRD managed by an operator [which is its own resource b/c of a call to k8s_kind]; it obviously has no pods behind it as it's just doing some configuration, but it does have a readiness condition. It'd be nice to make apps that depend on it wait; also, it's moderately annoying that that resource never turns green in Tilt

This feature (in conjunction with #2989) could also help ClusterAPI; see e.g. this hack around readiness conditions:

local("kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/{}/cert-manager.yaml".format(version))

# wait for the service to become available
local("kubectl wait --for=condition=Available --timeout=300s apiservice v1beta1.webhook.cert-manager.io")

I.e. all resources need to wait on on this YAML being applied and the resources being ready. If Tilt supported arbitrary k8s readiness checks, this YAML could be its own resource, and only go green (i.e. only trigger its deps to build) when the wait condition was met.

maiamcc avatar May 07 '20 20:05 maiamcc

Current best workaround is probably this:

local_resource("wait-for-cert-manager", 
  cmd="kubectl wait --for=condition=Available --timeout=300s apiservice v1beta1.webhook.cert-manager.io",
  resource_deps=["cert-manager"] # I don't know what Tilt will have named this resource
)

and then for other resources, declare a resource_dep of wait-for-cert-manager.

maiamcc avatar Oct 30 '20 16:10 maiamcc

it might be worthwhile to adapt Helm's code for this, which is pretty nice:

https://github.com/helm/helm/blob/master/pkg/kube/wait.go#L55

nicks avatar Oct 30 '20 17:10 nicks

cross-posting from #1003, where @ahmetb says:

if a k8s workload object, including a CRD, doesn't create pods, there isn't a way for tilt to know when it's ready.

I think this definition is inherently problematic. For example, Knative Service CRD creates Pods but deletes them later if they don't receive any requests. The readiness definition for many CRDs (incl. Knative) is converging towards the kstatus convention (the snippet/link I added in my original comment). So checking the Pods is not a good indicator, but the CRD itself reports its readiness.

Would it be noteworthy to consider looking at status.conditions[*].type~Ready.status=="True" to determine readiness of other resources? I suspect this convention is only going to get more widespread, and can be a good indicator, if present.

lizzthabet avatar Dec 15 '21 22:12 lizzthabet

Hi @ahmetb ! Ya, i think we investigated this idea. There's a few ways that Ready status (for production) isn't quite what you want in dev (where you want to know if the current changes are ready).

The Knative Service CRD is a good example of this. We actually DON'T want to consider a service ready if Knative deleted the pods. Because then Tilt doesn't have a place to live-update the files! The knative tilt extension README talks about this a bit: https://github.com/tilt-dev/tilt-extensions/tree/master/knative#scale-bounds

There's a longer discussion of how tilt tracks pods in this kubecon talk (which also has a shout-out to kubectl-tree :grinning: ) https://youtu.be/gBZ7M3g3uF4

But I do generally like the idea of having a richer readiness API, either as:

  1. status conditions (like you mentioned), or
  2. a readiness probe that runs locally (rather than in-cluster)

nicks avatar Dec 16 '21 22:12 nicks

@nicks

I definitely think that a custom way of defining the readiness for each k8s_resource would be necessary. I tried to play with primitive APIs such as local_resource/k8s_custom_deploy to reach out to the same behavior, but currently I didn't have any good way of achieving this.

Assuming I would like to assist in contributing to this, could we talk about general guidelines of this implementation so I could potentially create a PR?

ahrakos avatar Apr 08 '25 05:04 ahrakos

@maiamcc @nicks

A nice workaround I created is as follows:

  • Create a utility function for tilt -
def k8s_resource_readiness(name, resource_readiness_cmd = '', resource_path = '', deps = [], **kwargs):
    apply_issuer_path = os.path.abspath('kube-apply.py')
    delete_issuer_path = os.path.abspath('kube-delete.py')

    final_deps = deps
    if resource_path:
        final_deps = final_deps + [resource_path]

    k8s_custom_deploy(
      name,
      apply_env={
        'RESOURCE_READINESS_CMD': resource_readiness_cmd,
        'RESOURCE_PATH': resource_path
      },
      delete_env={
        'RESOURCE_PATH': resource_path
      },
      apply_cmd=['python3', apply_issuer_path],
      delete_cmd=['python3', delete_issuer_path],
      deps=final_deps
    )

    k8s_resource(name, pod_readiness='ignore', **kwargs)
  • Create an applier python file
# -*- mode: Python -*-
import subprocess
import sys
import time
import os

# Apply the resource
kubectl_cmd = ['pwd']
subprocess.run(kubectl_cmd,     
  stdout=sys.stderr, 
  stderr=sys.stderr
)

resource_readiness_cmd = os.environ.get('RESOURCE_READINESS_CMD', None)
resource_path = os.environ.get('RESOURCE_PATH', None)

kubectl_cmd = ['kubectl', 'apply', '-f', resource_path]
print("Running cmd: %s" % kubectl_cmd, file=sys.stderr)
subprocess.run(
  kubectl_cmd,
  stdout=sys.stderr, 
  stderr=sys.stderr
)

get_resource_data = ['kubectl', 'get', '-f', resource_path, '-o', "jsonpath={.kind}{' '}{.metadata.name}"]

print("Running cmd: %s" % get_resource_data, file=sys.stderr)
resource_data_raw = subprocess.check_output(get_resource_data)
resource_data = resource_data_raw.decode('utf-8').strip()
# print("Resource data " % resource_data, file=sys.stderr)
print(resource_data, file=sys.stderr)
final_resource_data = resource_data.split(' ')

get_cmd = ['kubectl', 'get', final_resource_data[0], final_resource_data[1], '-oyaml']
# get_status_cmd = ['kubectl', 'get', 'issuer', 'intermediate-ca-issuer', '-o', "jsonpath={.status.conditions[0].status}"]

# Poll the certificate status until condition[0].status equals "True"
max_wait = 60  # seconds to wait maximum
poll_interval = 10  # seconds between checks
start_time = time.time()

time.sleep(poll_interval)

while True:
    try:
        if resource_readiness_cmd:
            result = subprocess.run(
              resource_readiness_cmd.split(' '),
              stdout=sys.stderr, 
              stderr=sys.stderr
            )
            print("Waiting for issuer to be ready", file=sys.stderr)
            print(result, file=sys.stderr)

        final = subprocess.run(get_cmd)
        final.check_returncode()

        status_code = final.returncode

        print(f"Running cmd: {get_cmd}, Status Code: {status_code}, Result: {final}", file=sys.stderr)
        if status_code == 0:
            sys.exit(0)
        # print("Result is" % result, file=sys.stderr)
    except subprocess.CalledProcessError:
        # If the command fails (e.g., the resource is not yet ready), continue polling.
        if time.time() - start_time > max_wait:
            sys.exit(1)
        time.sleep(poll_interval)
        pass
  • Create a delete python file:
import subprocess
import sys
import time
import os

resource_path = os.environ.get('RESOURCE_PATH', None)
# Apply the resource
kubectl_cmd = ['kubectl', 'delete', '-f', resource_path]
print("Running cmd: %s" % kubectl_cmd, file=sys.stderr)
completed = subprocess.run(kubectl_cmd)
completed.check_returncode()
  • Use as follows - two resources that depend on each other with a custom readiness command:
k8s_resource_readiness(
  'intermediate-ca-issuer',
  resource_readiness_cmd='kubectl wait --for=condition=Ready issuer/intermediate-ca-issuer',
  resource_path='./issuer/intermediate.yaml',
  resource_deps=['root']
)

k8s_resource_readiness(
  'kafka-cert',
  resource_readiness_cmd='kubectl wait --for=condition=Ready certificate/kafka-cert',
  resource_path='./issuer/kafka-cert.yaml',
  resource_deps=['intermediate-ca-issuer']
)

Ofcourse, this is only a draft which is working and can be extended for more use-cases, but the general idea is to inject a generic command that checks the resource for readiness, i used kubectl wait and you can query the resources and build your own conditions around that.

ahrakos avatar Apr 09 '25 20:04 ahrakos