testkube icon indicating copy to clipboard operation
testkube copied to clipboard

Add status updates and event recording to TestWorkflowExecution controller

Open Copilot opened this issue 4 months ago • 2 comments

Pull request description

TestWorkflowExecution resources fail silently when the controller encounters errors (e.g., API server unreachable). Users see successful kubectl apply but no status updates or events, violating Kubernetes patterns for observability.

This PR implements the suggested approach from the issue analysis to add proper status updates and Kubernetes event recording to the TestWorkflowExecution controller.

Changes Made

Controller Changes (pkg/controller/testworkflowexecutionexecutor.go):

  • Changed reconciler from client.Reader to client.Client to enable status updates
  • Added event recorder parameter for emitting Kubernetes events
  • Update status with generation tracking before execution
  • Emit Warning events on execution failure with detailed error messages
  • Emit Normal events on successful execution start
  • Update status with error information when execution fails
  • Re-get object after status update to ensure consistency with fake client behavior in tests

Status Type Extension (api/testworkflows/v1/testworkflowexecution_types.go):

  • Added LastError field to TestWorkflowExecutionStatus to surface error messages

CRD Regeneration (k8s/crd/testworkflows.testkube.io_testworkflowexecutions.yaml):

  • Regenerated CRD manifests to include the new lastError field
  • Includes formatting improvements from controller-gen

Test Updates (pkg/controller/testworkflowexecutionexecutor_test.go):

  • Updated mocks to use client.Client instead of client.Reader
  • Added status subresource support to fake client
  • Added test cases for status updates on success and failure
  • Added test cases for event recording
  • Uses standard library strings.Contains instead of custom helper functions

Example

After these changes, users get feedback through standard kubectl commands:

$ kubectl describe testworkflowexecution example
...
Status:
  Generation:  1
  Last Error:  Post "http://testkube-api-server.podinfo.svc.cluster.local:8088/v1/test-workflows/example/executions": dial tcp: no such host
Events:
  Type     Reason           Message
  ----     ------           -------
  Warning  ExecutionFailed  Failed to execute test workflow: Post "http://testkube-api-server..."

Checklist (choose whats happened)

  • [ ] breaking change! (describe)
  • [x] tested locally
  • [ ] tested on cluster
  • [ ] added new dependencies
  • [ ] updated the docs
  • [x] added a test

Breaking changes

None. Status field addition is backward compatible.

Changes

  • TestWorkflowExecution controller now updates status on success/failure
  • Kubernetes events emitted for execution lifecycle (started, failed)
  • Error messages visible in status without requiring log access
  • Added LastError field to TestWorkflowExecutionStatus for surfacing scheduling errors
  • Regenerated CRD manifests with new status field

Fixes

  • Fixes kubeshop/testkube#6819
Original prompt

This section details on the original issue you should resolve

<issue_title>No status or events are reported on TestWorkflowExecution when there's an error</issue_title> <issue_description>Describe the bug

testkube-operator-controller-manager-7fbddc58b-6klj4 manager {"L":"ERROR","T":"2025-11-06T21:03:05.920Z","M":"Reconciler error","controller":"testworkflowexecution","controllerGroup":"testworkflows.testkube.io","controllerKind":"TestWorkflowExecution","TestWorkflowExecution":{"name":"example","namespace":"podinfo"},"namespace":"podinfo","name":"example","reconcileID":"00efa27d-17dc-41d5-9b06-6279fc302476","error":"Post \"http://testkube-api-server.podinfo.svc.cluster.local:8088/v1/test-workflows/example/executions?testWorkflowExecutionName=example\": dial tcp: lookup testkube-api-server.podinfo.svc.cluster.local on 192.168.194.138:53: no such host","S":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:341\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:288\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:249"}

I created a TestWorkflowExecution in my namespace associated with this demo service (podinfo). The resource was successfully applied, but the status of the resource did not change, and there were no events associated with it. This lack of feedback is problematic for a couple of reasons:

  1. Since it applied successfully, it seems like it something should eventually happen. But nothing will happen. Not providing feedback to the user via Status & Events is a k8s anti-pattern.
  2. How is someone supposed to debug this if they don't have access to the testkube controller to look at logs?

To Reproduce Steps to reproduce the behavior:

  1. kubectl testkube init
  2. kubectl create namespace podinfo

apply the following:

---
# https://docs.testkube.io/articles/test-workflows-examples-basics
apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
  name: example
  namespace: podinfo
spec:
  steps:
  - run:
      image: "curlimages/curl:7.78.0"
      args:
      - "http://my-podinfo.default.svc.cluster.local:9898"
---
apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflowExecution
metadata:
  name: example
spec:
  testWorkflow:
    name: example
  executionRequest:
    config: {}

Notice that apply will succeed, but nothing will actually happen. No status updates and no events.

kubectl describe TestWorkflowExecution example

Name:         example
Namespace:    podinfo
Labels:       <none>
Annotations:  argocd.argoproj.io/sync-options: Replace=true
              argocd.argoproj.io/tracking-id: testkube-example:testworkflows.testkube.io/TestWorkflowExecution:podinfo/example
API Version:  testworkflows.testkube.io/v1
Kind:         TestWorkflowExecution
Metadata:
  Creation Timestamp:  2025-11-06T20:24:34Z
  Generation:          2
  Resource Version:    7444
  UID:                 8d3c5dff-174c-4c82-9cda-e6d007485992
Spec:
  Execution Request:
    Config:
  Test Workflow:
    Name:  example
Events:    <none>

Expected behavior Any sort of feedback about what's happening (good or bad) via native Kubernetes channels (status & events)!

Version / Cluster

  • Which testkube version? 2.40
  • What Kubernetes cluster? Orbstack
  • What Kubernetes version?
k version
Client Version: v1.33.3
Kustomize Version: v5.6.0
Server Version: v1.32.6+orb1

Screenshots If applicable, add CLI commands/output to help explain your problem.

Additional context Add any other context about the problem here. </issue_description>

<agent_instructions>analyse the problem and provide a suggestion for how to solve this without making any actual code changes </agent_instructions>

Comments on the Issue (you are @copilot in this section)

  • Fixes kubeshop/testkube#6819

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot avatar Nov 06 '25 21:11 Copilot

@copilot please implement the suggested approach

olensmar avatar Nov 07 '25 01:11 olensmar

@copilot please implement the suggested approach

Implemented the suggested approach. The controller now:

  • Updates TestWorkflowExecution status with generation tracking and error messages
  • Emits Kubernetes events for both success and failure cases
  • Uses client.Client (read-write) instead of client.Reader

All changes committed in 32e9837. Tests pass and CRD manifests have been regenerated.

Copilot avatar Nov 07 '25 01:11 Copilot