podman icon indicating copy to clipboard operation
podman copied to clipboard

Podman machine gets stuck in STARTING state if interrupted during startup

Open cbr7 opened this issue 1 year ago • 14 comments

Issue Description

It seems that the podman machine will get stuck in STARTING state permanently if the user interrupts the startup sequence one time. The entire system requires to be restarted or the podman machine needs to be deleted to recover from this. This issue was initially reported in PD repo https://github.com/containers/podman-desktop/issues/9670.

https://github.com/user-attachments/assets/3a0527db-c397-47a1-a290-6b9b0b72bf28

Steps to reproduce the issue

Steps to reproduce the issue

  1. Create a new podman machine using podman machine init command
  2. Start the podman machine created at point 1 using podman machine start command
  3. Quickly after issuing the command from point 2 send CTRL-C command to the terminal (SIGINT) to terminate the operation.
  4. Run podman machine ls and notice that the podman machine created at point 1 will be permanently stuck in STARTING state.

Describe the results you received

Podman machine is stuck permanently in STARTING state.

Describe the results you expected

Presumably the state of the podman machine should be Stopped or alternately Running if the SIGINT is not sent fast enough to prevent the startup. Regardless of the state of the podman machine it should not be frozen execute commands correctly afterwards.

podman info output

If you are unable to run podman info for any reason, please provide the podman version, operating system and its version and the architecture you are running.

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

cbr7 avatar Oct 30 '24 10:10 cbr7

Your video does not work for me. Please post an actual log or transcript. Why do you call this a race? On your fourth point, you say "will not be permanently stuck" and I think you mean to say "will be"? Can you confirm this and update what you are trying to show in the video? You also need to provide podman info as the template suggests.

baude avatar Oct 30 '24 18:10 baude

@baude in what way is the video not working for you? I just watched it again, works fine, have you tried a different browser?

About point 4, you are correct, there was a not that should not have been, I've edited the post.

cbr7 avatar Oct 30 '24 18:10 cbr7

Attached the output from podman info

info.txt

cbr7 avatar Oct 30 '24 18:10 cbr7

UPDATE: I've also managed to reproduce the issue on macOS today, so it's not limited to Windows only.

cbr7 avatar Oct 31 '24 12:10 cbr7

when i click the video, it doesnt play. either way, we prefer people post text things where possible as opposed to binary objects.

baude avatar Oct 31 '24 13:10 baude

The text is that after podman machine start command is issued the user sends SIGINT signal into the terminal and that causes the problem, issue is 100% reproducible.

cbr7 avatar Oct 31 '24 13:10 cbr7

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Dec 01 '24 00:12 github-actions[bot]

More or less the same issue here although I don't really know if it happened because I interrupted it during the startup sequence but the thing is my machine was stuck in STARTING state and I couldn't do anything with it in the desktop (I'm on Windows) I had to use podman machine rm podman-machine-default to remove it and then I was able to recreate it properly on the desktop

> podman info
host:
  arch: amd64
  buildahVersion: 1.38.0
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.12-2.fc40.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: '
  cpuUtilization:
    idlePercent: 99.47
    systemPercent: 0.36
    userPercent: 0.17
  cpus: 8
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: container
    version: "40"
  eventLogger: journald
  freeLocks: 2048
  hostname: Bagheera
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.167.4-microsoft-standard-WSL2
  linkmode: dynamic
  logDriver: journald
  memFree: 7041286144
  memTotal: 8216662016
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.13.1-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.13.1
    package: netavark-1.13.0-1.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.13.0
  ociRuntime:
    name: crun
    package: crun-1.18.2-1.fc40.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.18.2
      commit: 00ab38af875ddd0d1a8226addda52e1de18339b5
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20241127.gc0fbc7e-1.fc40.x86_64
    version: |
      pasta 0^20241127.gc0fbc7e-1.fc40.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: unix:///run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: true
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 2147483648
  swapTotal: 2147483648
  uptime: 0h 32m 0.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /usr/lib/containers/storage
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 1081101176832
  graphRootUsed: 876167168
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.3.1
  Built: 1732147200
  BuiltTime: Thu Nov 21 01:00:00 2024
  GitCommit: ""
  GoVersion: go1.22.7
  Os: linux
  OsArch: linux/amd64
  Version: 5.3.1

leolivier avatar Dec 16 '24 09:12 leolivier

Seeing this today this on mac (up to date 15.3). New podman user.

Tried the podman machine rm podman-machine-default workaround, re-initing and re-starting without any joy.

podman machine start

Starting machine "podman-machine-default"

... never returns

podman machine ls

NAME                     VM TYPE     CREATED         LAST UP             CPUS        MEMORY      DISK SIZE
podman-machine-default*  applehv     40 minutes ago  Currently starting  4           2GiB        100GiB

podman info

OS: darwin/amd64
buildOrigin: pkginstaller
provider: applehv
version: 5.4.0

Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM
Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:61995: connect: connection refused

Any other ideas, or info I can provide to help debug?

smallsaucepan avatar Feb 14 '25 04:02 smallsaucepan

@smallsaucepan try podman machine rm -f to delete the currently stuck podman machine.

cbr7 avatar Feb 14 '25 07:02 cbr7

Thanks for the suggestion @cbr7. No luck though.

Tried with debug log level on machine start. The logging below appears and a window opens displaying grub bootloader for a second before displaying "Booting `Fedora CoreOS 41.20..." which then hangs. CPU goes to 400% for at least 20 minutes without any sign of progress in either window.

INFO[0000] boot parameters: &{EFIVariableStorePath:/Users/james/.local/share/containers/podman/machine/applehv/efi-bl-podman-machine-default CreateVariableStore:true}
INFO[0000]
INFO[0000] virtual machine parameters:
INFO[0000]      vCPUs: 4
INFO[0000]      memory: 2048 MiB
INFO[0000]
INFO[0000] Adding virtio-blk device (imagePath: /Users/james/.local/share/containers/podman/machine/applehv/podman-machine-default-amd64.raw)
INFO[0000] Adding virtio-rng device
INFO[0000] Adding virtio-vsock device
INFO[0000] Adding virtio-serial device (logFile: /var/folders/cq/2lcbk22j3j13215bz7y3nf2r0000gn/T/podman/podman-machine-default.log)
INFO[0000] Adding virtio-net device (nat: false macAddress: [5a:94:ef:e4:0c:ee])
INFO[0000] Using unix socket /var/folders/cq/2lcbk22j3j13215bz7y3nf2r0000gn/T/podman/podman-machine-default-gvproxy.sock
INFO[0000] local: /var/folders/cq/2lcbk22j3j13215bz7y3nf2r0000gn/T/podman/vfkit-15915-b2d0.sock remote: /var/folders/cq/2lcbk22j3j13215bz7y3nf2r0000gn/T/podman/podman-machine-default-gvproxy.sock
INFO[0000] Adding virtio-fs device
INFO[0000] Adding virtio-fs device
INFO[0000] Adding virtio-fs device
INFO[0000] Adding virtio-gpu device
INFO[0000] Adding virtio-input pointing device
INFO[0000] Adding virtio-input keyboard device
INFO[0000] virtual machine is running
INFO[0000] Exposing vsock port 1025 on /var/folders/cq/2lcbk22j3j13215bz7y3nf2r0000gn/T/podman/podman-machine-default.sock (listening)
INFO[0000] Exposing vsock port 1024 on /Users/james/.local/share/containers/podman/machine/applehv/podman-machine-default-ignition.sock (listening)
INFO[0000] waiting for VM to stop
2025-02-15 00:56:42.092 vfkit[88341:305874] +[IMKClient subclass]: chose IMKClient_Modern
2025-02-15 00:56:42.092 vfkit[88341:305874] +[IMKInputSession subclass]: chose IMKInputSession_Modern

Appears to get stuck here while CoreOS fails to boot in the other window.

smallsaucepan avatar Feb 14 '25 14:02 smallsaucepan

@smallsaucepan I think you're hitting https://github.com/containers/podman/issues/25121

benoitf avatar Feb 14 '25 14:02 benoitf

@smallsaucepan when all else fails restart the macbook/system, upon restart the podman machine will be off and you will be able to delete it.

EDIT: also what @benoitf said, you seem to have a different issue then what is reported in this ticket.

cbr7 avatar Feb 14 '25 14:02 cbr7

Thanks @benoitf and @cbr7. That's a much better fit to what I'm seeing.

smallsaucepan avatar Feb 15 '25 00:02 smallsaucepan

A fix for this that worked for me mentioned in #9670.

I encountered the situation after installing fresh versions of Desktop and podman's CLI after a six month hiatus from doing any podman related work with either. I'm guessing the state left recorded Started: true on disk may prevented the engine from starting despite the commit that closed this as completed

JonathanDoughty avatar May 22 '25 15:05 JonathanDoughty

In what version of podman is this fix supposed to be released in? Because I was just able to reproduce this using podman 5.5.2.

https://github.com/user-attachments/assets/f63f612d-acf6-4852-b0c0-812586086df4

cbr7 avatar Aug 18 '25 09:08 cbr7

In what version of podman is this fix supposed to be released in? Because I was just able to reproduce this using podman 5.5.2.

https://github.com/user-attachments/assets/f63f612d-acf6-4852-b0c0-812586086df4

It should be fixed in 5.6

jakecorrenti avatar Aug 18 '25 11:08 jakecorrenti