crc
crc copied to clipboard
Print warning instead of error in case of unstable cluster
Fixes: Issue #4284
Solution/Idea
Since the code doesn't exit, the error messaging might be confusing to users. Changed it to Warn
INFO Operator network is progressing
INFO Operator network is progressing
INFO Operator network is progressing
INFO Operator network is progressing
INFO Operator network is progressing
INFO Operator network is progressing
INFO Operator network is progressing
WARN Cluster is not ready: cluster operators are still not stable after 10m0.695631268s
INFO Adding crc-admin and crc-developer contexts to kubeconfig...
ERRO Cannot update kubeconfig: Head "https://oauth-openshift.apps-crc.testing:443": read tcp 127.0.0.1:60782->127.0.0.1:443: read: connection reset by peer
Started the OpenShift cluster.
The server is accessible via web console at:
https://console-openshift-console.apps-crc.testing
Log in as administrator:
Username: kubeadmin
Password: 3NM8K-C5kvg-YTRW4-FhiUM
Log in as user:
Username: developer
Password: developer
Use the 'oc' command line interface:
$ eval $(crc oc-env)
$ oc login -u developer https://api.crc.testing:6443
Testing
crc start and cluster operators should not get ready within the timeout. I did it by cordoning the cluster node and running start again.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign cfergeau for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@vyasgun: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| ci/prow/security | 65d4041f31589d077638514a39aeeb0641cc0b8f | link | false | /test security |
Full PR test history. Your PR dashboard.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.
@vyasgun can you check in case of ERRO Cannot update kubeconfig: Head "https://oauth-openshift.apps-crc.testing:443": read tcp 127.0.0.1:60782->127.0.0.1:443: read: connection reset by peer Started the OpenShift cluster. the error code is non-zero ?
@praveenkumar No. Do we want it to return a non-zero exit code?
@praveenkumar No. Do we want it to return a non-zero exit code?
For any error, yes we should return non-zero otherwise we should change it to warn but I think if we are not able to update the kubeconfig file then tell user how they can still access it.
@praveenkumar In theStart function, only 2 logging.Errorf() statements have been used and neither of them is followed by a non-zero return.
Also, kubeconfig is being updated in other places inside the function and all of them except the last one are returning an error. For example:
https://github.com/crc-org/crc/blob/main/pkg/crc/machine/start.go#L603
https://github.com/crc-org/crc/blob/main/pkg/crc/machine/start.go#L528
I'm not sure if there is a particular reason for these statements and the differences (or if it's just an oversight). Additionally, I think if updating kubeconfig is grounds for a non-zero return, so is an unstable cluster (to indicate a failure in Start). We should return these errors in the end and put any extra processing that might be still required in defer so it's always executed.
Additionally, I think if updating kubeconfig is grounds for a non-zero return, so is an unstable cluster (to indicate a failure in Start).
We could return a different error code in both cases when crc completes. Different error codes we might want to ignore 'cluster unstable'
The 'cannot update kubeconfig' message deserves to be made a lot more userfriendly :) Explain what won't work when this fails (I think this only means kube contexts can't be used, and that an explicit login to the cluster will be needed).
Additionally, I think if updating kubeconfig is grounds for a non-zero return, so is an unstable cluster (to indicate a failure in Start).
We could return a different error code in both cases when crc completes. Different error codes we might want to ignore 'cluster unstable'
The 'cannot update kubeconfig' message deserves to be made a lot more userfriendly :) Explain what won't work when this fails (I think this only means kube contexts can't be used, and that an explicit login to the cluster will be needed).
Yes, it should tell user how to access the cluster (like export KUBECONFIG=$HOME/.crc/machine/crc/kubeconfig or use oc --kubeconfig=$HOME/.crc/machine/crc/kubeconfig . They should able to debug or check which cluster operator is not in available state.
We probably can get this in, and create follow-up issues for the various improvements that have been discussed during the review?
@vyasgun Can you create the follow up issue which is discussed here? Once follow up issue is created we can merge this one.
Created a follow up issue: https://github.com/crc-org/crc/issues/4395