origin icon indicating copy to clipboard operation
origin copied to clipboard

crc crashes after 20-30 minutes of starting and upon re-start it starts in degraded mode

Open cozsmin opened this issue 5 months ago • 1 comments

Hello,

Hosting OS : RHEL 8.10 RAM : 64Gb CPU : 16 libvirt version : 8.0.0-23.3 CRC version: 2.51.0+80aa80 OpenShift version: 4.18.2 MicroShift version: 4.18.2

This issue happends both in crc version 2.49 and 2.51.

The ways to reproduce it :

  • download and install the latest crc for 2.49 I used https://developers.redhat.com/content-gateway/rest/mirror/pub/openshift-v4/clients/crc/2.49.0 for 2.51 I used https://developers.redhat.com/content-gateway/file/pub/openshift-v4/clients/crc/2.51.0/
  • execute : crc config set cpus 12 ; crc config set memory 30720
  • execute : crc setup
  • execute crc start
  • now I also install the following dependencyes for my app :
yum install -y git-core podman
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
curl -L https://istio.io/downloadIstio | sh -
istioctl install --set profile=openshift --set values.pilot.env.ENABLE_TLS_ON_SIDECAR_INGRESS=true --set components.cni.enabled=true --set values.cni.repair.deletePods="true"
Install "yq" by from https://github.com/mikefarah/yq/releases/download/4.42.1/${BINARY}.tar.gz
  • add the follwing to /etc/hosts :
127.0.0.1        api.crc.testing canary-openshift-ingress-canary.apps-crc.testing console-openshift-console.apps-crc.testing default-route-openshift-image-registry.apps-crc.testing downloads-openshift-console.apps-crc.testing host.crc.testing oauth-openshift.apps-crc.testing
127.0.0.1        controlplane.obpee00.com openldap.obpee00.com
  • also add those entries to my client machine , but instead of "127.0.0.1" I set the external IP
  • now I install an application developed by my company , the application runs , I can access it at "https://controlplane.obpee00.com" from my client machine
  • at this point I can still see the following open ports on the VM which hosts crc :
sudo ss -lnpt | grep crc
LISTEN   0        2048           127.0.0.1:6443          0.0.0.0:*       users:(("crc",pid=322571,fd=13))
LISTEN   0        2048           127.0.0.1:2222          0.0.0.0:*       users:(("crc",pid=322571,fd=11))
LISTEN   0        2048                   *:443                 *:*       users:(("crc",pid=322571,fd=14))
LISTEN   0        2048                   *:80                  *:*       users:(("crc",pid=322571,fd=15))

    • wait 20-30 minutes , and after "crc status" shows :
            crc status
            CRC VM:          Running
            OpenShift:       Unreachable (v4.18.2)
            Disk Usage:      0B of 0B (Inside the CRC VM)
            Cache Usage:     28.13GB
            Cache Directory: /home/azureuser/.crc/cache

  • there are nomore crc ports opened by crc daemon on the output of "sudo ss -lnpt"
  • after stoping crc and re-starting it I can see the following :
            crc start
            ...
            INFO 2 operators are progressing: image-registry, network
            INFO 2 operators are progressing: image-registry, network
            INFO 2 operators are progressing: image-registry, network
            INFO 2 operators are progressing: image-registry, network
            INFO 2 operators are progressing: image-registry, network
            INFO 2 operators are progressing: image-registry, network
            WARN Cluster is not ready: cluster operators are still not stable after 10m1.500421844s
            INFO Adding crc-admin and crc-developer contexts to kubeconfig...
            Started the OpenShift cluster.
            ...

    • now the web page at "console-openshift-console.apps-crc.testing" is reachable , but the one at "https://controlplane.obpee00.com" is not working Doing a curl debug I see :
            $ curl -k https://controlplane.obpee00.com -vv
            14:40:26.529000 [0-0] * Host controlplane.obpee00.com:443 was resolved.
            14:40:26.532000 [0-0] * IPv6: (none)
            14:40:26.533000 [0-0] * IPv4: 52.188.186.103
            14:40:26.535000 [0-0] * [HTTPS-CONNECT] created with 1 ALPNs -> 0
            14:40:26.537000 [0-0] * [HTTPS-CONNECT] added
            14:40:26.539000 [0-0] * [HTTPS-CONNECT] connect, init
            14:40:26.541000 [0-0] *   Trying 52.188.186.103:443...
            14:40:26.543000 [0-0] * [HTTPS-CONNECT] connect -> 0, done=0
            14:40:26.545000 [0-0] * [HTTPS-CONNECT] adjust_pollset -> 1 socks
            14:40:26.546000 [0-0] * [HTTPS-CONNECT] connect -> 0, done=0
            14:40:26.548000 [0-0] * [HTTPS-CONNECT] adjust_pollset -> 1 socks
            14:40:26.682000 [0-0] * schannel: disabled automatic use of client certificate
            14:40:26.696000 [0-0] * ALPN: curl offers http/1.1
            14:40:26.702000 [0-0] * [HTTPS-CONNECT] connect -> 0, done=0
            14:40:26.705000 [0-0] * [HTTPS-CONNECT] adjust_pollset -> 1 socks
            14:40:26.835000 [0-0] * schannel: failed to receive handshake, SSL/TLS connection failed
            14:40:26.844000 [0-0] * [HTTPS-CONNECT] connect, all failed
            14:40:26.849000 [0-0] * [HTTPS-CONNECT] connect -> 35, done=0
            14:40:26.854000 [0-0] * closing connection #0
            14:40:26.856000 [0-0] * [HTTPS-CONNECT] close
            14:40:26.858000 [0-0] * [SETUP] close
            14:40:26.860000 [0-0] * [SETUP] destroy
            14:40:26.861000 [0-0] * [HTTPS-CONNECT] destroy
            curl: (35) schannel: failed to receive handshake, SSL/TLS connection failed

I have attached the contents of "crc.log" and "crcd.log"

crc.log crcd.log

Please let me know if there are more steps to debug.

cozsmin avatar May 30 '25 11:05 cozsmin

I tracked down the pod which was supposed to listen on https://controlplane.obpee00.com/

  kubectl -n istio-system get pod istiod-9c7f9cb4f-5snlj  -o yaml | grep ^status: -A30
  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: "2025-05-30T11:27:06Z"
      **message: '0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure:
        }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.'**
      reason: Unschedulable
      status: "False"
      type: PodScheduled
    phase: Pending
    qosClass: Burstable

Strange I would say because the node is UP : kubectl get nodes -n istio-system NAME STATUS ROLES AGE VERSION crc Ready control-plane,master,worker 74d v1.31.6

Not sure if this a crc-only issue or kubernetes-general related , but I expect pods to be schedulable if the node is ready

What do you say ?

cozsmin avatar May 30 '25 11:05 cozsmin

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Aug 29 '25 01:08 openshift-bot

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot avatar Sep 28 '25 08:09 openshift-bot

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-bot avatar Oct 29 '25 00:10 openshift-bot

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar Oct 29 '25 00:10 openshift-ci[bot]