kopf icon indicating copy to clipboard operation
kopf copied to clipboard

Log handling from pods

Open nolar opened this issue 5 years ago • 0 comments

Having the silent handlers (spies) on the built-in Kubernetes objects (#30), the next step would be to silently watch over the pod's logs.

An example use-case: monitor the logs for specific lines (by pattern), and extract the KPIs of the process in them, or their status, which can then be put on the Kubernetes object's status:

import kopf

@kopf.on.log('', 'v1', 'pods',
             regex=r'model accuracy is (\d+\.\d+)%')
def accuracy_log(namespace, meta, patch, log, match, **kwargs):
    model_name = meta.get('labels', {}).get('model')
    accuracy = float(match.group(1))
    accuracy_str = f'{accuracy:2f}%'

    api = kubernetes.client.CustomObjectsApi()
    api.patch_namespaced_custom_object(
        group='zalando.org', 
        version='v1',
        plural='trainingjobs',
        namespace=namespace,
        name=model_name, 
        body={'status': {'accuracy': accuracy_str}},
    )

@kopf.on.log('', 'v1', 'pods',
             regex=r'Traceback (most recent call last):')
def error_log(namespace, meta, patch, log, match, **kwargs):
    model_name = meta.get('labels', {}).get('model')
    api = kubernetes.client.CustomObjectsApi()
    api.patch_namespaced_custom_object(
        group='zalando.org', 
        version='v1',
        plural='trainingjobs',
        namespace=namespace,
        name=model_name, 
        body={'status': {'training': 'FAILED'}},
    )

Important: Perhaps, some filtering by the labels is needed, so that we do not watch over all the pods (there can be a lot of them), but only those of our interest. E.g., by the presence of model label in the examples above, so that only the model-pods are taken into account. See #45.

Such a TrainingJob custom resource can the be defined as follows:

spec:
  ………
  additionalPrinterColumns:
    - name: Accuracy
      type: string
      priority: 0
      JSONPath: .spec.accuracy

When listed, the objects will print their accuracy:

$ kubectl get TrainingJob
NAME             ACCURACY
model-1          87.23%

nolar avatar Apr 26 '19 09:04 nolar