Podman machine gets stuck in STARTING state if interrupted during startup
Issue Description
It seems that the podman machine will get stuck in STARTING state permanently if the user interrupts the startup sequence one time. The entire system requires to be restarted or the podman machine needs to be deleted to recover from this. This issue was initially reported in PD repo https://github.com/containers/podman-desktop/issues/9670.
https://github.com/user-attachments/assets/3a0527db-c397-47a1-a290-6b9b0b72bf28
Steps to reproduce the issue
Steps to reproduce the issue
- Create a new podman machine using
podman machine initcommand - Start the podman machine created at point 1 using
podman machine startcommand - Quickly after issuing the command from point 2 send
CTRL-Ccommand to the terminal (SIGINT) to terminate the operation. - Run
podman machine lsand notice that the podman machine created at point 1 will be permanently stuck inSTARTINGstate.
Describe the results you received
Podman machine is stuck permanently in STARTING state.
Describe the results you expected
Presumably the state of the podman machine should be Stopped or alternately Running if the SIGINT is not sent fast enough to prevent the startup. Regardless of the state of the podman machine it should not be frozen execute commands correctly afterwards.
podman info output
If you are unable to run podman info for any reason, please provide the podman version, operating system and its version and the architecture you are running.
Podman in a container
No
Privileged Or Rootless
None
Upstream Latest Release
Yes
Additional environment details
Additional environment details
Additional information
Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting
Your video does not work for me. Please post an actual log or transcript. Why do you call this a race? On your fourth point, you say "will not be permanently stuck" and I think you mean to say "will be"? Can you confirm this and update what you are trying to show in the video? You also need to provide podman info as the template suggests.
@baude in what way is the video not working for you? I just watched it again, works fine, have you tried a different browser?
About point 4, you are correct, there was a not that should not have been, I've edited the post.
UPDATE: I've also managed to reproduce the issue on macOS today, so it's not limited to Windows only.
when i click the video, it doesnt play. either way, we prefer people post text things where possible as opposed to binary objects.
The text is that after podman machine start command is issued the user sends SIGINT signal into the terminal and that causes the problem, issue is 100% reproducible.
A friendly reminder that this issue had no activity for 30 days.
More or less the same issue here although I don't really know if it happened because I interrupted it during the startup sequence but the thing is my machine was stuck in STARTING state and I couldn't do anything with it in the desktop (I'm on Windows)
I had to use podman machine rm podman-machine-default to remove it and then I was able to recreate it properly on the desktop
> podman info
host:
arch: amd64
buildahVersion: 1.38.0
cgroupControllers:
- cpuset
- cpu
- cpuacct
- blkio
- memory
- devices
- freezer
- net_cls
- perf_event
- net_prio
- hugetlb
- pids
- rdma
- misc
cgroupManager: cgroupfs
cgroupVersion: v1
conmon:
package: conmon-2.1.12-2.fc40.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.1.12, commit: '
cpuUtilization:
idlePercent: 99.47
systemPercent: 0.36
userPercent: 0.17
cpus: 8
databaseBackend: sqlite
distribution:
distribution: fedora
variant: container
version: "40"
eventLogger: journald
freeLocks: 2048
hostname: Bagheera
idMappings:
gidmap: null
uidmap: null
kernel: 5.15.167.4-microsoft-standard-WSL2
linkmode: dynamic
logDriver: journald
memFree: 7041286144
memTotal: 8216662016
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: aardvark-dns-1.13.1-1.fc40.x86_64
path: /usr/libexec/podman/aardvark-dns
version: aardvark-dns 1.13.1
package: netavark-1.13.0-1.fc40.x86_64
path: /usr/libexec/podman/netavark
version: netavark 1.13.0
ociRuntime:
name: crun
package: crun-1.18.2-1.fc40.x86_64
path: /usr/bin/crun
version: |-
crun version 1.18.2
commit: 00ab38af875ddd0d1a8226addda52e1de18339b5
rundir: /run/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
os: linux
pasta:
executable: /usr/bin/pasta
package: passt-0^20241127.gc0fbc7e-1.fc40.x86_64
version: |
pasta 0^20241127.gc0fbc7e-1.fc40.x86_64
Copyright Red Hat
GNU General Public License, version 2 or later
<https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
remoteSocket:
exists: true
path: unix:///run/podman/podman.sock
rootlessNetworkCmd: pasta
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: false
serviceIsRemote: true
slirp4netns:
executable: ""
package: ""
version: ""
swapFree: 2147483648
swapTotal: 2147483648
uptime: 0h 32m 0.00s
variant: ""
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
search:
- docker.io
store:
configFile: /usr/share/containers/storage.conf
containerStore:
number: 0
paused: 0
running: 0
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.imagestore: /usr/lib/containers/storage
overlay.mountopt: nodev,metacopy=on
graphRoot: /var/lib/containers/storage
graphRootAllocated: 1081101176832
graphRootUsed: 876167168
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "false"
Supports d_type: "true"
Supports shifting: "false"
Supports volatile: "true"
Using metacopy: "true"
imageCopyTmpDir: /var/tmp
imageStore:
number: 0
runRoot: /run/containers/storage
transientStore: false
volumePath: /var/lib/containers/storage/volumes
version:
APIVersion: 5.3.1
Built: 1732147200
BuiltTime: Thu Nov 21 01:00:00 2024
GitCommit: ""
GoVersion: go1.22.7
Os: linux
OsArch: linux/amd64
Version: 5.3.1
Seeing this today this on mac (up to date 15.3). New podman user.
Tried the podman machine rm podman-machine-default workaround, re-initing and re-starting without any joy.
podman machine start
Starting machine "podman-machine-default"
... never returns
podman machine ls
NAME VM TYPE CREATED LAST UP CPUS MEMORY DISK SIZE
podman-machine-default* applehv 40 minutes ago Currently starting 4 2GiB 100GiB
podman info
OS: darwin/amd64
buildOrigin: pkginstaller
provider: applehv
version: 5.4.0
Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM
Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:61995: connect: connection refused
Any other ideas, or info I can provide to help debug?
@smallsaucepan try podman machine rm -f to delete the currently stuck podman machine.
Thanks for the suggestion @cbr7. No luck though.
Tried with debug log level on machine start. The logging below appears and a window opens displaying grub bootloader for a second before displaying "Booting `Fedora CoreOS 41.20..." which then hangs. CPU goes to 400% for at least 20 minutes without any sign of progress in either window.
INFO[0000] boot parameters: &{EFIVariableStorePath:/Users/james/.local/share/containers/podman/machine/applehv/efi-bl-podman-machine-default CreateVariableStore:true}
INFO[0000]
INFO[0000] virtual machine parameters:
INFO[0000] vCPUs: 4
INFO[0000] memory: 2048 MiB
INFO[0000]
INFO[0000] Adding virtio-blk device (imagePath: /Users/james/.local/share/containers/podman/machine/applehv/podman-machine-default-amd64.raw)
INFO[0000] Adding virtio-rng device
INFO[0000] Adding virtio-vsock device
INFO[0000] Adding virtio-serial device (logFile: /var/folders/cq/2lcbk22j3j13215bz7y3nf2r0000gn/T/podman/podman-machine-default.log)
INFO[0000] Adding virtio-net device (nat: false macAddress: [5a:94:ef:e4:0c:ee])
INFO[0000] Using unix socket /var/folders/cq/2lcbk22j3j13215bz7y3nf2r0000gn/T/podman/podman-machine-default-gvproxy.sock
INFO[0000] local: /var/folders/cq/2lcbk22j3j13215bz7y3nf2r0000gn/T/podman/vfkit-15915-b2d0.sock remote: /var/folders/cq/2lcbk22j3j13215bz7y3nf2r0000gn/T/podman/podman-machine-default-gvproxy.sock
INFO[0000] Adding virtio-fs device
INFO[0000] Adding virtio-fs device
INFO[0000] Adding virtio-fs device
INFO[0000] Adding virtio-gpu device
INFO[0000] Adding virtio-input pointing device
INFO[0000] Adding virtio-input keyboard device
INFO[0000] virtual machine is running
INFO[0000] Exposing vsock port 1025 on /var/folders/cq/2lcbk22j3j13215bz7y3nf2r0000gn/T/podman/podman-machine-default.sock (listening)
INFO[0000] Exposing vsock port 1024 on /Users/james/.local/share/containers/podman/machine/applehv/podman-machine-default-ignition.sock (listening)
INFO[0000] waiting for VM to stop
2025-02-15 00:56:42.092 vfkit[88341:305874] +[IMKClient subclass]: chose IMKClient_Modern
2025-02-15 00:56:42.092 vfkit[88341:305874] +[IMKInputSession subclass]: chose IMKInputSession_Modern
Appears to get stuck here while CoreOS fails to boot in the other window.
@smallsaucepan I think you're hitting https://github.com/containers/podman/issues/25121
@smallsaucepan when all else fails restart the macbook/system, upon restart the podman machine will be off and you will be able to delete it.
EDIT: also what @benoitf said, you seem to have a different issue then what is reported in this ticket.
Thanks @benoitf and @cbr7. That's a much better fit to what I'm seeing.
A fix for this that worked for me mentioned in #9670.
I encountered the situation after installing fresh versions of Desktop and podman's CLI after a six month hiatus from doing any podman related work with either. I'm guessing the state left recorded Started: true on disk may prevented the engine from starting despite the commit that closed this as completed
In what version of podman is this fix supposed to be released in? Because I was just able to reproduce this using podman 5.5.2.
https://github.com/user-attachments/assets/f63f612d-acf6-4852-b0c0-812586086df4
In what version of podman is this fix supposed to be released in? Because I was just able to reproduce this using podman 5.5.2.
https://github.com/user-attachments/assets/f63f612d-acf6-4852-b0c0-812586086df4
It should be fixed in 5.6