Openshift local version 4.18.2 goes to unreachable status if the environment is idle for sometime.
General information
Issue only occur after sometime, like 1 hour of idleness.
[admin@ocp1 ~]$ crc status
CRC VM: Running
OpenShift: Unreachable (v4.18.2)
Disk Usage: 0B of 0B (Inside the CRC VM)
Cache Usage: 28.13GB
Cache Directory: /home/admin/.crc/cache
[admin@ocp1 ~]$
[admin@ocp1 ~]$ sudo cat /var/log/libvirt/qemu/crc.log
2025-04-15 16:14:19.999+0000: Starting external device: virtiofsd
/usr/libexec/virtiofsd --fd=34 --shared-dir /home/admin
2025-04-15 16:14:20.011+0000: starting up libvirt version: 10.5.0, package: 7.5.el9_5 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2025-01-21-08:08:49, ), qemu version: 9.0.0qemu-kvm-9.0.0-10.el9_5.2, kernel: 5.14.0-427.42.1.el9_4.x86_64, hostname: ocp1.fyre.ibm.com
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \
HOME=/var/lib/libvirt/qemu/domain-1-crc \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-crc/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-crc/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-crc/.config \
/usr/libexec/qemu-kvm \
-name guest=crc,debug-threads=on \
-S \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-crc/master-key.aes"}' \
-blockdev '{"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/crc_VARS.fd","node-name":"libvirt-pflash1-storage","read-only":false}' \
-machine pc-q35-rhel9.4.0,usb=off,dump-guest-core=off,memory-backend=pc.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-storage,acpi=on \
-accel kvm \
Booting `Red Hat Enterprise Linux CoreOS 418.94.202502250906-0 (ostree:0)'
[admin@ocp1 ~]$ oc login -u developer https://api.crc.testing:6443
error: dial tcp 127.0.0.1:6443: connect: connection refused - verify you have provided the correct host and port and that the server is currently running.
Operating System
Linux
Hypervisor
KVM
Did you run crc setup before crc start?
yes
Running on
Laptop
Steps to reproduce
A running crc instance abruptly stops after sometime of idleness.
CRC version
4.18.2
CRC status
[admin@ocp1 ~]$ crc status --log-level debug
DEBU CRC version: 2.49.0+e843be
DEBU OpenShift version: 4.18.2
DEBU MicroShift version: 4.18.2
DEBU Running 'crc status'
CRC VM: Running
OpenShift: Unreachable (v4.18.2)
Disk Usage: 0B of 0B (Inside the CRC VM)
Cache Usage: 28.13GB
Cache Directory: /home/admin/.crc/cache
CRC config
[admin@ocp1 ~]$ crc config view
- consent-telemetry : no
- cpus : 8
- disk-size : 100
- enable-cluster-monitoring : true
- memory : 32768
- pull-secret-file : /home/admin/apps/ocp/pull-secret.txt
Host Operating System
[admin@ocp1 ~]$ cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="9.5 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.5"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.5 (Plow)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"
BUG_REPORT_URL="https://issues.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_BUGZILLA_PRODUCT_VERSION=9.5
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.5"
Expected behavior
crc should not stop.
Actual behavior
crc abruptly stops
CRC Logs
restart works. But requires restart every 1 hour.
Additional context
No response
Issue noted only when running with network-mode as user. When it is set to system with a HA proxy redirecting request to the openshift IP and port, issue does not occur.
When this happens, can you still ssh into the cluster? https://github.com/crc-org/engineering-docs/blob/main/content/Debugging.md#access-the-vm (the linux instructions are outdated…)
$ ssh -i ~/.crc/machines/crc/id_ed25519 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -p 2222 [email protected]
@cfergeau
Thanks for the quick response. It does not work.
[admin@system1 crc]$ ls -lrt
total 21258324
-rw------- 1 admin admin 81 Apr 22 08:44 id_ed25519.pub
-rw------- 1 admin admin 387 Apr 22 08:44 id_ed25519
-rw------- 1 admin admin 23 Apr 22 08:44 kubeadmin-password
srwxr-xr-x 1 admin admin 0 Apr 22 08:44 docker.sock
-rw------- 1 admin admin 901 Apr 22 08:44 config.json
-rw------- 1 admin admin 15275 Apr 22 08:49 kubeconfig
-rw-r--r-- 1 qemu qemu 21767782400 Apr 22 21:04 crc.qcow2
[admin@system1 crc]$
[admin@system1 crc]$ ssh -i ~/.crc/machines/crc/id_ed25519 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -p 2222 [email protected]
ssh: connect to host 127.0.0.1 port 2222: Connection refused
You can find some explanations for typical errors at this link:
https://red.ht/support_rhel_ssh
[admin@system1 crc]$
[admin@system1 crc]$ crc status
CRC VM: Running
OpenShift: Unreachable (v4.18.2)
Disk Usage: 0B of 0B (Inside the CRC VM)
Cache Usage: 28.13GB
Cache Directory: /home/admin/.crc/cache
[admin@system1 crc]$
Can you check systemctl --user status crc-daemon.service and also using journalctl --user-unit crc-daemon.service check the logs for that service.
Hi @praveenkumar @cfergeau, I've noticed the exact same thing on my setup:
[user@crc-host ~]$ crc status
CRC VM: Running
OpenShift: Unreachable (v4.18.2)
Disk Usage: 0B of 0B (Inside the CRC VM)
Cache Usage: 28.13GB
Cache Directory: /home/user/.crc/cache
[user@crc-host ~]$ oc get pod
The connection to the server api.crc.testing:6443 was refused - did you specify the right host or port?
[user@crc-host ~]$ ssh crc
ssh: connect to host 127.0.0.1 port 2222: Connection refused
[user@crc-host ~]$ cat ~/.ssh/config
Host crc
Hostname 127.0.0.1
Port 2222
User core
IdentityFile ~/.crc/machines/crc/id_ecdsa
IdentityFile ~/.crc/machines/crc/id_ed25519
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
[user@crc-host ~]$
This is the version I'm using:
[user@crc-host ~]$ crc version
CRC version: 2.51.0+80aa80
OpenShift version: 4.18.2
MicroShift version: 4.18.2
[user@crc-host ~]$
The latest version points to CRC version 2.51.0 (above) with OpenShift bundle 4.18.2.
Any ideas why is it stuck / unreachable? It's also impossible to see the console of the VM:
[user@crc-host ~]$ sudo virsh console crc
error: internal error: character device serial0 is not using a PTY
[user@crc-host ~]$
So no logs whatsoever once the RHCOS guest VM enters this state. Very hard to debug and find out what happened.
A work-around is to reboot it every 1h with a cronjob:
crc stop
crc start
But this is obviously far from being ideal.
I'll give an older version, 2.48.0 (with OpenShift 4.18.1) a try and see if it solves it.
Interesting to hear your thoughts on how to debug it!
When I use an older version, 2.48.0 (with OpenShift 4.18.1) then everything seems to be healthy and it works:
[ofircohen@crc-host ~]$ crc status
CRC VM: Running
OpenShift: Running (v4.18.1)
Disk Usage: 27.87GB of 128.2GB (Inside the CRC VM)
Cache Usage: 27.93GB
Cache Directory: /home/ofircohen/.crc/cache
[ofircohen@crc-host ~]$ oc get pod
NAME READY STATUS RESTARTS AGE
wiz-integration-agent-6c4477f6d5-dbpdz 1/1 Running 0 4h15m
[ofircohen@crc-host ~]$ ssh crc uptime
Warning: Permanently added '[127.0.0.1]:2222' (ED25519) to the list of known hosts.
no such identity: /home/ofircohen/.crc/machines/crc/id_ecdsa: No such file or directory
01:23:21 up 4:27, 0 users, load average: 1.41, 1.09, 1.21
[ofircohen@crc-host ~]$
What could be the issue? Why des 4.8/12 feeze / get stck? https://developers.redhat.com/content-gateway/rest/mirror/pub/openshift-v4/clients/crc/latest/crc-linux-amd64.tar.xz
Is there. away to debug/troubleshoot them?
Thanks!
I can confirm that it works with crc 2.48.0 and OpenShift 4.18.1:
[user@crc-host ~]$ ssh crc uptime
Warning: Permanently added '[127.0.0.1]:2222' (ED25519) to the list of known hosts.
no such identity: /home/user/.crc/machines/crc/id_ecdsa: No such file or directory
11:20:41 up 14:24, 0 users, load average: 0.45, 0.73, 0.92
[user@crc-host ~]$ crc status
CRC VM: Running
OpenShift: Running (v4.18.1)
Disk Usage: 31.65GB of 128.2GB (Inside the CRC VM)
Cache Usage: 27.93GB
Cache Directory: /home/user/.crc/cache
[user@crc-host ~]$ crc version
WARN A new version (2.51.0) has been published on https://developers.redhat.com/content-gateway/file/pub/openshift-v4/clients/crc/2.51.0/crc-linux-amd64.tar.xz
CRC version: 2.48.0+1aa46c
OpenShift version: 4.18.1
MicroShift version: 4.18.1
[user@crc-host ~]$
So there seems to be a regression with crc 2.51.0 with OpenShift 4.18.2.
This one works: https://developers.redhat.com/content-gateway/file/pub/openshift-v4/clients/crc/2.48.0/crc-linux-amd64.tar.xz
This one doesn't: https://developers.redhat.com/content-gateway/rest/mirror/pub/openshift-v4/clients/crc/latest/crc-linux-amd64.tar.xz
It would be nice to get some more debugging/troubleshooting tooling around the libvirt/qemu VM to be able to fetch some more useful diagnostics.
As a data point, can you check this https://github.com/crc-org/crc/issues/4730#issuecomment-2829644362 ?
The systemctl --user status crc-daemon.service and journalctl --user-unit crc-daemon.service were fine, there were Input/Output errors on the vsock because it got stuck on the guest VM side of things.
No useful diagnostics from the logs / from the daemons, we don't forward systemd-journald or the kernel ring buffer (dmesg) back to the host so I'm afraid it's still a black box.
I have setup an end to end guide on how to bring up this cluster on GCE: https://www.linkedin.com/posts/cohen-ofir_kubernetes-openshift-rhel-activity-7341589911390539776-cz6G?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAVWybwBi9G-qEFOErHtBmK_-tkEhlPZ_cc
Note that I had to use an extremely generous VM to accommodate the crc guest VM: ⚙️ My Setup (recommended for smoother experience) Host: GCE n2-standard-8 VM (32GiB RAM, 200GiB disk) ↳ Enable nested virtualization ↳ RHEL 9 image (rhel-cloud/rhel-9)
Guest CRC config: crc 2.48.0
curl -LO https://developers.redhat.com/content-gateway/file/pub/openshift-v4/clients/crc/2.48.0/crc-linux-amd64.tar.xz
crc config set memory 20000
crc config set cpus 8
crc config set disk-size 120
crc setup
crc start -p pull-secret.txt
As the CRC is very consuming:
[ofircohen@crc-host ~]$ crc status
CRC VM: Running
OpenShift: Running (v4.18.1)
Disk Usage: 48.79GB of 128.2GB (Inside the CRC VM)
Cache Usage: 27.93GB
Cache Directory: /home/ofircohen/.crc/cache
[ofircohen@crc-host ~]$
Notice how it expanded from 25GiB disk space to ~50GiB after just 2 days of running straight:
[user@crc-host ~]$ ssh crc uptime
Warning: Permanently added '[127.0.0.1]:2222' (ED25519) to the list of known hosts.
22:49:23 up 2 days, 1:53, 0 users, load average: 1.01, 0.83, 0.77
[user@crc-host ~]$
@codersyacht
I was also facing the same issue. The issue was with the crc-daemon.service, it is running in user scope and it will be stop when the user session is end/logout.
This can be fixed by enabling "Linger" for the user
sudo loginctl enable-linger $USER
For more, read: enable-linger
Once enabled you can restart the crc to verify.