crc Print warning instead of error in case of unstable cluster

Fixes: Issue #4284

Solution/Idea

Since the code doesn't exit, the error messaging might be confusing to users. Changed it to Warn

INFO Operator network is progressing
INFO Operator network is progressing
INFO Operator network is progressing
INFO Operator network is progressing
INFO Operator network is progressing
INFO Operator network is progressing
INFO Operator network is progressing
WARN Cluster is not ready: cluster operators are still not stable after 10m0.695631268s
INFO Adding crc-admin and crc-developer contexts to kubeconfig...
ERRO Cannot update kubeconfig: Head "https://oauth-openshift.apps-crc.testing:443": read tcp 127.0.0.1:60782->127.0.0.1:443: read: connection reset by peer
Started the OpenShift cluster.

The server is accessible via web console at:
  https://console-openshift-console.apps-crc.testing

Log in as administrator:
  Username: kubeadmin
  Password: 3NM8K-C5kvg-YTRW4-FhiUM

Log in as user:
  Username: developer
  Password: developer

Use the 'oc' command line interface:
  $ eval $(crc oc-env)
  $ oc login -u developer https://api.crc.testing:6443

Testing

crc start and cluster operators should not get ready within the timeout. I did it by cordoning the cluster node and running start again.

Aug 12 '24 10:08 vyasgun

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign cfergeau for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Aug 12 '24 10:08 openshift-ci[bot]

@vyasgun: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/security	65d4041f31589d077638514a39aeeb0641cc0b8f	link	false	`/test security`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Aug 12 '24 11:08 openshift-ci[bot]

@vyasgun can you check in case of ERRO Cannot update kubeconfig: Head "https://oauth-openshift.apps-crc.testing:443": read tcp 127.0.0.1:60782->127.0.0.1:443: read: connection reset by peer Started the OpenShift cluster. the error code is non-zero ?

Aug 13 '24 09:08 praveenkumar

@praveenkumar No. Do we want it to return a non-zero exit code?

Aug 13 '24 09:08 vyasgun

@praveenkumar No. Do we want it to return a non-zero exit code?

For any error, yes we should return non-zero otherwise we should change it to warn but I think if we are not able to update the kubeconfig file then tell user how they can still access it.

Aug 13 '24 09:08 praveenkumar

@praveenkumar In theStart function, only 2 logging.Errorf() statements have been used and neither of them is followed by a non-zero return. Also, kubeconfig is being updated in other places inside the function and all of them except the last one are returning an error. For example: https://github.com/crc-org/crc/blob/main/pkg/crc/machine/start.go#L603 https://github.com/crc-org/crc/blob/main/pkg/crc/machine/start.go#L528

I'm not sure if there is a particular reason for these statements and the differences (or if it's just an oversight). Additionally, I think if updating kubeconfig is grounds for a non-zero return, so is an unstable cluster (to indicate a failure in Start). We should return these errors in the end and put any extra processing that might be still required in defer so it's always executed.

Aug 13 '24 11:08 vyasgun

Additionally, I think if updating kubeconfig is grounds for a non-zero return, so is an unstable cluster (to indicate a failure in Start).

We could return a different error code in both cases when crc completes. Different error codes we might want to ignore 'cluster unstable'

The 'cannot update kubeconfig' message deserves to be made a lot more userfriendly :) Explain what won't work when this fails (I think this only means kube contexts can't be used, and that an explicit login to the cluster will be needed).

Sep 05 '24 11:09 cfergeau

Additionally, I think if updating kubeconfig is grounds for a non-zero return, so is an unstable cluster (to indicate a failure in Start).

We could return a different error code in both cases when crc completes. Different error codes we might want to ignore 'cluster unstable'

The 'cannot update kubeconfig' message deserves to be made a lot more userfriendly :) Explain what won't work when this fails (I think this only means kube contexts can't be used, and that an explicit login to the cluster will be needed).

Yes, it should tell user how to access the cluster (like export KUBECONFIG=$HOME/.crc/machine/crc/kubeconfig or use oc --kubeconfig=$HOME/.crc/machine/crc/kubeconfig . They should able to debug or check which cluster operator is not in available state.

Sep 11 '24 03:09 praveenkumar

We probably can get this in, and create follow-up issues for the various improvements that have been discussed during the review?

Oct 01 '24 09:10 cfergeau

@vyasgun Can you create the follow up issue which is discussed here? Once follow up issue is created we can merge this one.

Oct 08 '24 06:10 praveenkumar

Created a follow up issue: https://github.com/crc-org/crc/issues/4395

Oct 08 '24 10:10 vyasgun

crc crc copied to clipboard

Print warning instead of error in case of unstable cluster

Solution/Idea

Testing

crc
crc copied to clipboard