origin
origin copied to clipboard
crc crashes after 20-30 minutes of starting and upon re-start it starts in degraded mode
Hello,
Hosting OS : RHEL 8.10 RAM : 64Gb CPU : 16 libvirt version : 8.0.0-23.3 CRC version: 2.51.0+80aa80 OpenShift version: 4.18.2 MicroShift version: 4.18.2
This issue happends both in crc version 2.49 and 2.51.
The ways to reproduce it :
- download and install the latest crc for 2.49 I used https://developers.redhat.com/content-gateway/rest/mirror/pub/openshift-v4/clients/crc/2.49.0 for 2.51 I used https://developers.redhat.com/content-gateway/file/pub/openshift-v4/clients/crc/2.51.0/
- execute : crc config set cpus 12 ; crc config set memory 30720
- execute : crc setup
- execute crc start
- now I also install the following dependencyes for my app :
yum install -y git-core podman
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
curl -L https://istio.io/downloadIstio | sh -
istioctl install --set profile=openshift --set values.pilot.env.ENABLE_TLS_ON_SIDECAR_INGRESS=true --set components.cni.enabled=true --set values.cni.repair.deletePods="true"
Install "yq" by from https://github.com/mikefarah/yq/releases/download/4.42.1/${BINARY}.tar.gz
- add the follwing to /etc/hosts :
127.0.0.1 api.crc.testing canary-openshift-ingress-canary.apps-crc.testing console-openshift-console.apps-crc.testing default-route-openshift-image-registry.apps-crc.testing downloads-openshift-console.apps-crc.testing host.crc.testing oauth-openshift.apps-crc.testing
127.0.0.1 controlplane.obpee00.com openldap.obpee00.com
- also add those entries to my client machine , but instead of "127.0.0.1" I set the external IP
- now I install an application developed by my company , the application runs , I can access it at "https://controlplane.obpee00.com" from my client machine
- at this point I can still see the following open ports on the VM which hosts crc :
sudo ss -lnpt | grep crc
LISTEN 0 2048 127.0.0.1:6443 0.0.0.0:* users:(("crc",pid=322571,fd=13))
LISTEN 0 2048 127.0.0.1:2222 0.0.0.0:* users:(("crc",pid=322571,fd=11))
LISTEN 0 2048 *:443 *:* users:(("crc",pid=322571,fd=14))
LISTEN 0 2048 *:80 *:* users:(("crc",pid=322571,fd=15))
-
- wait 20-30 minutes , and after "crc status" shows :
crc status
CRC VM: Running
OpenShift: Unreachable (v4.18.2)
Disk Usage: 0B of 0B (Inside the CRC VM)
Cache Usage: 28.13GB
Cache Directory: /home/azureuser/.crc/cache
- there are nomore crc ports opened by crc daemon on the output of "sudo ss -lnpt"
- after stoping crc and re-starting it I can see the following :
crc start
...
INFO 2 operators are progressing: image-registry, network
INFO 2 operators are progressing: image-registry, network
INFO 2 operators are progressing: image-registry, network
INFO 2 operators are progressing: image-registry, network
INFO 2 operators are progressing: image-registry, network
INFO 2 operators are progressing: image-registry, network
WARN Cluster is not ready: cluster operators are still not stable after 10m1.500421844s
INFO Adding crc-admin and crc-developer contexts to kubeconfig...
Started the OpenShift cluster.
...
-
- now the web page at "console-openshift-console.apps-crc.testing" is reachable , but the one at "https://controlplane.obpee00.com" is not working Doing a curl debug I see :
$ curl -k https://controlplane.obpee00.com -vv
14:40:26.529000 [0-0] * Host controlplane.obpee00.com:443 was resolved.
14:40:26.532000 [0-0] * IPv6: (none)
14:40:26.533000 [0-0] * IPv4: 52.188.186.103
14:40:26.535000 [0-0] * [HTTPS-CONNECT] created with 1 ALPNs -> 0
14:40:26.537000 [0-0] * [HTTPS-CONNECT] added
14:40:26.539000 [0-0] * [HTTPS-CONNECT] connect, init
14:40:26.541000 [0-0] * Trying 52.188.186.103:443...
14:40:26.543000 [0-0] * [HTTPS-CONNECT] connect -> 0, done=0
14:40:26.545000 [0-0] * [HTTPS-CONNECT] adjust_pollset -> 1 socks
14:40:26.546000 [0-0] * [HTTPS-CONNECT] connect -> 0, done=0
14:40:26.548000 [0-0] * [HTTPS-CONNECT] adjust_pollset -> 1 socks
14:40:26.682000 [0-0] * schannel: disabled automatic use of client certificate
14:40:26.696000 [0-0] * ALPN: curl offers http/1.1
14:40:26.702000 [0-0] * [HTTPS-CONNECT] connect -> 0, done=0
14:40:26.705000 [0-0] * [HTTPS-CONNECT] adjust_pollset -> 1 socks
14:40:26.835000 [0-0] * schannel: failed to receive handshake, SSL/TLS connection failed
14:40:26.844000 [0-0] * [HTTPS-CONNECT] connect, all failed
14:40:26.849000 [0-0] * [HTTPS-CONNECT] connect -> 35, done=0
14:40:26.854000 [0-0] * closing connection #0
14:40:26.856000 [0-0] * [HTTPS-CONNECT] close
14:40:26.858000 [0-0] * [SETUP] close
14:40:26.860000 [0-0] * [SETUP] destroy
14:40:26.861000 [0-0] * [HTTPS-CONNECT] destroy
curl: (35) schannel: failed to receive handshake, SSL/TLS connection failed
I have attached the contents of "crc.log" and "crcd.log"
Please let me know if there are more steps to debug.
I tracked down the pod which was supposed to listen on https://controlplane.obpee00.com/
kubectl -n istio-system get pod istiod-9c7f9cb4f-5snlj -o yaml | grep ^status: -A30
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2025-05-30T11:27:06Z"
**message: '0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure:
}. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.'**
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: Burstable
Strange I would say because the node is UP : kubectl get nodes -n istio-system NAME STATUS ROLES AGE VERSION crc Ready control-plane,master,worker 74d v1.31.6
Not sure if this a crc-only issue or kubernetes-general related , but I expect pods to be schedulable if the node is ready
What do you say ?
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle rotten /remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.
/close
@openshift-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting
/reopen. Mark the issue as fresh by commenting/remove-lifecycle rotten. Exclude this issue from closing again by commenting/lifecycle frozen./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.