core
core copied to clipboard
Cannot create a coherent snapshot on KVM via quiescing when Zenarmor is installed
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
- [X] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
- [X] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue
Describe the bug
Running OPNSense on KVM, I cannot create a quiesce snapshot via libvirt:
virsh snapshot-create opnsense.local --disk-only --atomic --quiesce
will have the following error:
error: internal error: unable to execute QEMU agent command 'guest-fsfreeze-freeze': failed to freeze /usr/local/zenarmor/output/active/temp: Resource deadlock avoided
Removing --quiesce from the libvirt command works.
To Reproduce
Steps to reproduce the behavior:
- Have OPNSense installed with Zenarmor
- Create a quiesce snapshot via KVM
Expected behavior
Quiescing should run all necessary pre snapshot freeze and thaw scripts.
Additional context
I've tried to find the necessary freeze/thaw scripts in OPNsense in order to exclude ramdisks, in our case /usr/local/zenarmor/output/active/temp from quiescing.
I've also tried to find the freeze/thaw scripts in order to suspend Zenarmor service until the snapshot is done.
Couldn't find any relevant info in OS.
Running qemu-ga in OPNSense suggests that it will read the freeze/thaw script in /usr/local/bin/../etc/qemu/fsfreeze-hook if found.
I've made the following changes in OPNSense:
In /etc/rc.conf.d/qemu_guest_agent:
- qemu_guest_agent_flags="-d -l /var/log/qemu-ga.log"
+ qemu_guest_agent_flags="-d -l /var/log/qemu-ga.log -F/usr/local/etc/qemu/fsfreeze-hook"
Then I created the following script in /usr/local/etc/qemu/fsfreeze-hook and made it executable:
#!/bin/sh
LOG_FILE=/var/log/qemu-ga.log
# Static device name found in /usr/local/etc/rc.d/eastpect
ZENARMOR_RAMDISK="/dev/md43"
ZENARMOR_RAMDISK_MOUNTPOINT="/usr/local/zenarmor/output/active/temp"
log () {
echo "$1" >> "${LOG_FILE}";
}
case "$1" in
"freeze")
log "Launching freeze operations"
if [ -d "${ZENARMOR_RAMDISK_MOUNTPOINT}" ]; then
log "Zenarmor installed, Stopping engine"
zenarmorctl engine stop >> "${LOG_FILE}" 2>&1
umount "${ZENARMOR_RAMDISK_MOUNTPOINT}" >> "${LOG_FILE}" 2>&1
sleep 1
fi
# Return 0 regardless of state, since a pre-stopped engine might return a false code
log "Freeze operation done"
exit 0
;;
"thaw")
log "Launching thaw operations"
if [ -d "${ZENARMOR_RAMDISK_MOUNTPOINT}" ]; then
log "Zenarmor installed, starting engine"
mount "${ZENARMOR_RAMDISK}" "${ZENARMOR_RAMDISK_MOUNTPOINT}" >> "${LOG_FILE}" 2>&1
zenarmorctl engine start >> "${LOG_FILE}" 2>&1
sleep 1
fi
log "Thaw operation done"
exit 0
;;
*)
log "No options given. Nothing will happen. Options are 'freeze' or 'thaw'"
exit 1
;;
esac
So far so good, I can now use --quiesce to make my snapshots application aware.
I am more than willing to make a PR for this issue, if @fichtner or @AdSchellevis could have a quick look just to make sure I didn't commit any errors, especially since I don't know if /etc/rc.conf.d/qemu_guest_agent is generated on boot.
Also, should I make this PR for qemu_guest_agent plugin instead of core ?
Thanks ;)
Relevant forum entry: https://forum.opnsense.org/index.php?topic=38943.0
Environment
I've tried this with all OPNsense versions from 22.7 up to recent 24.7_5, on multiple hosts, all with KVM.
Discovered an issue with qemu-ga, see https://github.com/opnsense/plugins/issues/4148 So this is now in standby mode.
This issue has been automatically timed-out (after 180 days of inactivity).
For more information about the policies for this repository, please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.
If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.