Virtiofs: long delays in restarting/stopping a VM which uses virtiofs
These commits introduce a substantial delay in rebooting or shutting down a VM. I am not sure commit exactly which, as the first one fails to start the virtiofs with a permission denied error. so tested together.
https://github.com/astro/microvm.nix/commit/36e261aa2c15a4020663cef87905f436b15305f7 https://github.com/astro/microvm.nix/commit/a50d1bf0d4a2f23ce2ffc503d65c9423a8a268e1
latest working commit is here: https://github.com/astro/microvm.nix/commit/bb0c6d867bae270288197e01449bba74ac0d381e
As can be seen in the below log for a reboot of the vm sudo systemctl restart [email protected] the shutdown takes 1.5 minutes before the graceful shutdown eventually times out.
...
Aug 05 19:28:50 ghaf-host microvm@admin-vm[1460]: [ OK ] Reached target System Reboot.
Aug 05 19:28:50 ghaf-host microvm@admin-vm[1460]: microvm@admin-vm: terminating on signal 15 from pid 1 (<unknown process>)
Aug 05 19:28:50 ghaf-host supervisord[1189]: virtiofsd-log-store [2024-08-05T15:28:50Z INFO virtiofsd] Client disconnected, shutting down
Aug 05 19:28:50 ghaf-host supervisord[1189]: virtiofsd-ro-store [2024-08-05T15:28:50Z INFO virtiofsd] Client disconnected, shutting down
Aug 05 19:28:50 ghaf-host microvm-virtiofsd@admin-vm[1189]: 2024-08-05 19:28:50,856 INFO exited: virtiofsd-log-store (exit status 0; expected)
Aug 05 19:28:50 ghaf-host systemd[1]: [email protected]: Deactivated successfully.
Aug 05 19:28:50 ghaf-host systemd[1]: Stopping VirtioFS daemons for MicroVM 'admin-vm'...
Aug 05 19:30:21 ghaf-host systemd[1]: [email protected]: Stopping timed out. Terminating.
Aug 05 19:30:21 ghaf-host microvm-virtiofsd@admin-vm[1189]: 2024-08-05 19:30:21,068 WARN exited: notify (terminated by SIGTERM; not expected)
Aug 05 19:30:21 ghaf-host microvm-virtiofsd@admin-vm[1189]: 2024-08-05 19:30:21,069 WARN received SIGTERM indicating exit request
Aug 05 19:30:21 ghaf-host systemd[1]: [email protected]: Control process exited, code=killed, status=15/TERM
Aug 05 19:30:21 ghaf-host systemd[1]: [email protected]: Failed with result 'timeout'.
Aug 05 19:30:21 ghaf-host systemd[1]: Stopped VirtioFS daemons for MicroVM 'admin-vm'.
Aug 05 19:30:21 ghaf-host systemd[1]: [email protected]: Consumed 12.210s CPU time, 49.8M memory peak.
after which it starts to boot again as normal.
Aug 05 19:30:21 ghaf-host systemd[1]: Starting VirtioFS daemons for MicroVM 'admin-vm'...
Aug 05 19:30:21 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:21,294 INFO Set uid to user 0 succeeded
Aug 05 19:30:21 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:21,295 INFO supervisord started with pid 1845
Aug 05 19:30:22 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:22,299 INFO spawned: 'notify' with pid 1860
Aug 05 19:30:22 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:22,301 INFO spawned: 'virtiofsd-log-store' with pid 1861
Aug 05 19:30:22 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:22,303 INFO spawned: 'virtiofsd-ro-store' with pid 1862
Aug 05 19:30:22 ghaf-host supervisord[1845]: virtiofsd-log-store [2024-08-05T15:30:22Z INFO virtiofsd] Waiting for vhost-user socket connection...
Aug 05 19:30:22 ghaf-host supervisord[1845]: virtiofsd-ro-store [2024-08-05T15:30:22Z INFO virtiofsd] Waiting for vhost-user socket connection...
Aug 05 19:30:23 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:23,341 INFO success: notify entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Aug 05 19:30:23 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:23,342 INFO success: virtiofsd-log-store entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Aug 05 19:30:23 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:23,342 INFO success: virtiofsd-ro-store entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Aug 05 19:30:23 ghaf-host systemd[1]: Started VirtioFS daemons for MicroVM 'admin-vm'.
@SuperSandro2000 Any ideas?
Maybe supervisord is not stopping after all processes it supervisors where told to stop. Maybe we can convince it with another signal?
Doing some more investigations and logging, I can see the following failures are occurring in the shutdown phase between supervisord and the virtiofsd processes
Aug 09 11:10:45 ghaf-host supervisord[1226]: virtiofsd-log-store [2024-08-09T07:10:45Z INFO virtiofsd] Client disconnected, shutting down
Aug 09 11:10:45 ghaf-host supervisord[1226]: virtiofsd-ro-store [2024-08-09T07:10:45Z INFO virtiofsd] Client disconnected, shutting down
Aug 09 11:10:45 ghaf-host microvm-virtiofsd@admin-vm[1226]: 2024-08-09 11:10:45,931 INFO exited: virtiofsd-log-store (exit status 0; expected)
Aug 09 11:10:45 ghaf-host systemd[1]: [email protected]: Deactivated successfully.
Aug 09 11:10:45 ghaf-host systemd[1]: Stopped MicroVM 'admin-vm'.
Aug 09 11:10:45 ghaf-host systemd[1]: [email protected]: Consumed 24.634s CPU time, 478.8M memory peak.
Aug 09 11:10:46 ghaf-host systemd[1]: Stopping VirtioFS daemons for MicroVM 'admin-vm'...
Aug 09 11:10:46 ghaf-host microvm-virtiofsd@admin-vm[1783]: ERROR: http://localhost:9001 refused connection (already shut down?)
Aug 09 11:10:46 ghaf-host systemd[1]: [email protected]: Control process exited, code=exited, status=1/FAILURE
Aug 09 11:10:46 ghaf-host microvm-virtiofsd@admin-vm[1226]: 2024-08-09 11:10:46,166 WARN exited: notify (terminated by SIGTERM; not expected)
Aug 09 11:10:46 ghaf-host microvm-virtiofsd@admin-vm[1226]: 2024-08-09 11:10:46,166 WARN exited: virtiofsd-ro-store (exit status 1; not expected)
Aug 09 11:10:46 ghaf-host microvm-virtiofsd@admin-vm[1226]: 2024-08-09 11:10:46,166 WARN received SIGTERM indicating exit request
Aug 09 11:10:46 ghaf-host systemd[1]: [email protected]: Failed with result 'exit-code'.
Aug 09 11:10:46 ghaf-host systemd[1]: Stopped VirtioFS daemons for MicroVM 'admin-vm'.