func icon indicating copy to clipboard operation
func copied to clipboard

Add early cluster validation to prevent wasted build time in func deploy

Open RayyanSeliya opened this issue 1 month ago • 7 comments

Changes

  • :bug: Add early cluster validation to prevent wasted build time in func deploy
  • :broom: Improve error messages for cluster connection failures with actionable guidance

What Changed

func deploy now validates Kubernetes cluster connectivity before starting the container build process, preventing wasted time when the cluster is inaccessible.

Before:

  • Built container image first (2-5 minutes)
  • Then failed with confusing errors like "invalid run-image" or "context canceled"
  • Different errors depending on whether --build=false was used

After:

  • Validates cluster connection immediately (< 5 seconds)
  • Fails fast with clear, specific error messages
  • Consistent behavior regardless of build flags
  • Provides actionable guidance for each error type

Implementation

Added 2-layer error handling:

  • Layer 1 (pkg/functions/errors.go): Technical errors (ErrInvalidKubeconfig, ErrClusterNotAccessible)
  • Layer 2 (cmd/deploy.go): User-friendly CLI messages with examples

Detects three distinct error scenarios:

  1. Invalid kubeconfig file path
  2. Empty/no cluster configuration
  3. Cluster unreachable (network, auth, down, etc.)

Testing

Tested all combinations:

  • ✅ Invalid KUBECONFIG path (with/without build flags)
  • ✅ Empty kubeconfig (with/without build flags)
  • ✅ Unreachable cluster (with/without build flags)
  • ✅ Cluster stopped after configuration (network error)
  • ✅ Valid cluster with kind (success path)
  • ✅ Unit tests pass (TestDeploy_ConfigPrecedence, etc.)

/kind bug

Fixes #3116

Release Note

`func deploy` now validates cluster connectivity before building, providing immediate feedback with clear error messages instead of wasting time on builds that will fail deployment.

RayyanSeliya avatar Oct 19 '25 14:10 RayyanSeliya

Hi @RayyanSeliya. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

knative-prow[bot] avatar Oct 19 '25 14:10 knative-prow[bot]

Codecov Report

:x: Patch coverage is 12.16216% with 65 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 63.14%. Comparing base (63026ce) to head (c4652dd). :warning: Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
cmd/deploy.go 0.00% 38 Missing :warning:
pkg/knative/deployer.go 24.00% 16 Missing and 3 partials :warning:
pkg/knative/client.go 27.27% 4 Missing and 4 partials :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3117      +/-   ##
==========================================
+ Coverage   59.91%   63.14%   +3.22%     
==========================================
  Files         150      150              
  Lines       13353    13501     +148     
==========================================
+ Hits         8001     8525     +524     
+ Misses       4416     3971     -445     
- Partials      936     1005      +69     
Flag Coverage Δ
e2e-tests 42.29% <12.16%> (+11.05%) :arrow_up:
integration-tests 57.60% <12.16%> (+1.21%) :arrow_up:
unit-tests 49.46% <0.00%> (-0.52%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Oct 19 '25 14:10 codecov[bot]

/ok-to-test

gauron99 avatar Oct 19 '25 15:10 gauron99

Im not much of a fan of this implementation... its still "test aware" as Matej said - with those literal strings and constants. Also there has to be some better way to implement this :thinking: Maybe as a method of client Instance (in pkg/functions/client.go) ? Instance has its method "CurrentState" or "State" or "Available" or something like this? @lkingland @matejvasek

gauron99 avatar Oct 21 '25 15:10 gauron99

Im not much of a fan of this implementation... its still "test aware" as Matej said - with those literal strings and constants. Also there has to be some better way to implement this 🤔 Maybe as a method of client Instance (in pkg/functions/client.go) ? Instance has its method "CurrentState" or "State" or "Available" or something like this? @lkingland @matejvasek

@gauron99 Good point about the client architecture! I checked pkg/functions/client.go and saw it already has methods like Deploy(), Build(), etc.

I'm thinking we could add:

func (c *Client) ClusterAvailable(ctx context.Context) error

Then just call it before building. Tests would naturally skip it since they use NewTestClient() with mocks.

The challenge is timing - we currently validate before creating the client (line 423 vs line 462 in cmd/deploy.go's runDeploy()). We'd need to either:

  1. Create the client earlier, or
  2. Make it a standalone function like functions.ValidateCluster()

Which direction makes more sense for the existing code? Don't want to refactor the way which is not feasible ..!

@lkingland @matejvasek

RayyanSeliya avatar Oct 22 '25 15:10 RayyanSeliya

This one is a little tricky... I understand the desire to validate in the CLI to "fail fast". But too much pre-validation creates a hard dependency between the CLI and the deployer.

I think it might be worth looking into placing this validation in the deployer implementation itself, returing a typed error on failure, and then capturing this error in the CLI and adding "CLI specific" help text as necessary.

Remember that it's ok if the system builds and then fails on deployment, because it should not repeat the build on a subsequent deploy (it detects the build is "fresh").

lkingland avatar Nov 03 '25 11:11 lkingland

This one is a little tricky... I understand the desire to validate in the CLI to "fail fast". But too much pre-validation creates a hard dependency between the CLI and the deployer.

I think it might be worth looking into placing this validation in the deployer implementation itself, returing a typed error on failure, and then capturing this error in the CLI and adding "CLI specific" help text as necessary.

Remember that it's ok if the system builds and then fails on deployment, because it should not repeat the build on a subsequent deploy (it detects the build is "fresh").

thx for the feedback @lkingland that makes sense and sounds good to have the validation into the deployer itself with typed errors and CLI just catches those and provide a user-friendly errors ! can have a look now and ping me if any more changes needed !

RayyanSeliya avatar Nov 05 '25 16:11 RayyanSeliya

@RayyanSeliya: GitHub didn't allow me to request PR reviews from the following users: take, when, some, moments, pls, a, look, have.

Note that only knative members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @lkingland pls take a look when have some moments

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

knative-prow[bot] avatar Nov 09 '25 17:11 knative-prow[bot]

/cc @lkingland

RayyanSeliya avatar Nov 09 '25 17:11 RayyanSeliya

hey @lkingland dont know why the test are failing !!

RayyanSeliya avatar Nov 11 '25 08:11 RayyanSeliya

/override ?

gauron99 avatar Nov 11 '25 09:11 gauron99

@gauron99: /override requires failed status contexts, check run or a prowjob name to operate on. The following unknown contexts/checkruns were given:

  • ?

Only the following failed contexts/checkruns were expected:

  • E2E Test (ubuntu-latest, springboot)
  • EasyCLA
  • On Cluster RT Test (ubuntu-latest, pack)
  • style / suggester / github_actions
  • style / suggester / shell
  • style / suggester / yaml
  • tide
  • unit-tests_func_main

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/override ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

knative-prow[bot] avatar Nov 11 '25 09:11 knative-prow[bot]

/override "On Cluster RT Test (ubuntu-latest, pack)" "E2E Test (ubuntu-latest, springboot)"

gauron99 avatar Nov 11 '25 09:11 gauron99

@gauron99: Overrode contexts on behalf of gauron99: E2E Test (ubuntu-latest, springboot), On Cluster RT Test (ubuntu-latest, pack)

In response to this:

/override "On Cluster RT Test (ubuntu-latest, pack)" "E2E Test (ubuntu-latest, springboot)"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

knative-prow[bot] avatar Nov 11 '25 09:11 knative-prow[bot]

These fails are unrelated. Its an issue with our custom pack builder currently

gauron99 avatar Nov 11 '25 09:11 gauron99

/lgtm /approve

gauron99 avatar Nov 11 '25 09:11 gauron99

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gauron99, lkingland, RayyanSeliya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [gauron99,lkingland]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

knative-prow[bot] avatar Nov 11 '25 09:11 knative-prow[bot]

These fails are unrelated. Its an issue with our custom pack builder currently

Yeah ! 👍Thx ..

RayyanSeliya avatar Nov 11 '25 09:11 RayyanSeliya