Flatcar icon indicating copy to clipboard operation
Flatcar copied to clipboard

[RFE] Quiesce filesystem during VM snapshots

Open web2wire opened this issue 5 months ago • 9 comments

Current situation

I have a couple dozen or so Flatcar systems in my home lab and have spent quite a lot of time creating a drop-in system for generating the configs. Mostly are running the latest stable release, with a handful using the beta channel. All works very well in general and I'm very happy with it so big kudos to the devs.

I noticed the other day (though it's probably been happening for a while, if not forever) that if I create manual snapshots under vSphere or automatically via my backup solution that the file system is never quiesced, even though that option is selected in the options. There are no errors and everything seems to proceed correctly but the resulting snapshot has not has the f/s quiesced. Snapshots taken on Ubuntu VMs are however quiesced with a more or less identical version of open-vmtools.

Impact

I am unable to make application consistent backups. Backup software continually reports warning and failures with Flatcar systems.

Ideal future situation

The filesystem should quiesce when requested. Or, if it can't, a valid reason or error should be emitted.

Additional information

I don't know if there is a rationale that Flatcar has adopted or inherited for this behaviour. Possibly it's working exactly as intended in which case it would be useful to know the reasoning behind it and make that information more readily available.

web2wire avatar Jul 02 '25 16:07 web2wire

Thanks for the issue, I'll try to repro. As we don't have major VMWare expertise, may I ask you clear repro steps? I'd be interested to know how VMWare actually freeze the filesystems (maybe I can ask you logs of the instance when it's being shutted down)

tormath1 avatar Jul 04 '25 10:07 tormath1

so I booted an instance on VMWare, took 3 snapshots and tried to restore snapshot 2 and it worked fine with the quiesce options being ticked-off.

tormath1 avatar Jul 07 '25 11:07 tormath1

Thanks for looking into this. The snapshots do generally work, however the the 'quiesce filesystem' option does not seem to have any effect with Flatcar. If I check the option, the snapshot is created without error, but if you then look at the snapshot it was created without the f/s being quiesced. If I do this with Ubuntu or other distros the quiescing setting is applied.

Image

Image

In my case this probably doesn't cause any particular issues, other than the snapshots are not created in form that you expect, most of the time the VMs are not super busy in terms of disk activity, so the snapshots are good enough. However I have backup software that uses the snapshot functionality to create backups and it continually complains that the snapshots are not application consistent (e.g. the f/s is not guaranteed to be snapshotted atomically). This then has the effect that the backup is not considered to be valid and so no older backups get purged and it continually eats space. I'd like to get to the bottom of why this is, when open-vm-tools is present on the VMs but there is obviously something else different or missing between Flatcar and other distros that means that the quiescing operation silently (as far as I can tell) fails to work.

web2wire avatar Jul 07 '25 12:07 web2wire

I think part of the problem is that /etc/vmware-tools is on a read-only filesystem, so it can't create/update its quiesce_manifest.xml file.

You can enable some debug info:

sudo -i

cp -Lr /etc/vmware-tools /opt/
cat <<EOF > /opt/vmware-tools/tools.conf
[logging]
log = true
vmbackup.level = debug
vmbackup.handler = file
EOF

mkdir /etc/systemd/system/vmtoolsd.service.d
cat <<EOF > /etc/systemd/system/vmtoolsd.service.d/10-run-with-config-file.conf
[Service]
ExecStart=
ExecStart=/usr/bin/vmtoolsd -c /opt/vmware-tools/tools.conf
EOF

systemctl daemon-reload
systemctl restart vmtoolsd

exit

...then create a snapshot and look at your /var/log/vmware-vmbackup-root.log file.

alliek-mti avatar Sep 29 '25 16:09 alliek-mti

Hi @alliek-mti, thanks for your suggestion which I've now tried. This is the output that was written to the log during the snapshot. It appears that you were correct in that there is a an issue with opening the the quiesce_manifest.xml file but I don't know if there is any other useful info in the output. I wonder if there is a way to get vmtoolsd to use a different/writable path for the manifest?

[2025-09-30T10:15:26.318Z] [   debug] [vmbackup] [11804] Using quiesceApps = 1, quiesceFS = 1, allowHWProvider = 1, execScripts = 1, scriptArg = , timeout = 0, enableNullDriver = 1, forceQuiesce = 0
[2025-09-30T10:15:26.318Z] [   debug] [vmbackup] [11804] Using excludedFileSystems = "(null)", ignoreFrozenFileSystems = 0
[2025-09-30T10:15:26.318Z] [   debug] [vmbackup] [11804] Quiescing volumes: (null)
[2025-09-30T10:15:26.318Z] [   debug] [vmbackup] [11804] *** VmBackup_SendEventNoAbort
[2025-09-30T10:15:26.318Z] [   debug] [vmbackup] [11804] Sending vmbackup event: vmbackup.eventSet reset 0 
[2025-09-30T10:15:26.318Z] [   debug] [vmbackup] [11804] *** VmBackupStartScripts
[2025-09-30T10:15:26.319Z] [   debug] [vmbackup] [11804] Trying to run scripts from /etc/vmware-tools/backupScripts.d
[2025-09-30T10:15:27.319Z] [   debug] [vmbackup] [11804] *** VmBackupAsyncCallback
[2025-09-30T10:15:27.319Z] [   debug] [vmbackup] [11804] *** VmBackupPostProcessCurrentOp
[2025-09-30T10:15:27.319Z] [   debug] [vmbackup] [11804] VmBackupPostProcessCurrentOp: checking VmBackupOnFreeze
[2025-09-30T10:15:27.319Z] [   debug] [vmbackup] [11804] Async request 'VmBackupOnFreeze' completed
[2025-09-30T10:15:27.319Z] [   debug] [vmbackup] [11804] *** VmBackupEnableSyncWait
[2025-09-30T10:15:27.320Z] [   debug] [vmbackup] [11804] Submitted backup start task.
[2025-09-30T10:15:27.320Z] [   debug] [vmbackup] [11828] *** VmBackupSyncDriverStart
[2025-09-30T10:15:28.321Z] [   debug] [vmbackup] [11804] *** VmBackupAsyncCallback
[2025-09-30T10:15:28.321Z] [   debug] [vmbackup] [11804] *** VmBackupPostProcessCurrentOp
[2025-09-30T10:15:28.321Z] [   debug] [vmbackup] [11804] VmBackupPostProcessCurrentOp: checking VmBackupSyncDriverStart
[2025-09-30T10:15:28.321Z] [   debug] [vmbackup] [11804] SyncDriver status: 0
[2025-09-30T10:15:28.321Z] [   debug] [vmbackup] [11804] Async request 'VmBackupSyncDriverStart' completed
[2025-09-30T10:15:28.321Z] [   debug] [vmbackup] [11804] *** VmBackupSyncDriverReadyForSnapshot
[2025-09-30T10:15:28.321Z] [   debug] [vmbackup] [11804] *** VmBackup_SendEventNoAbort
[2025-09-30T10:15:28.321Z] [   debug] [vmbackup] [11804] Sending vmbackup event: vmbackup.eventSet prov.snapshotCommit 0 
[2025-09-30T10:15:28.437Z] [   debug] [vmbackup] [11804] *** VmBackupEnableSync
[2025-09-30T10:15:28.437Z] [   debug] [vmbackup] [11804] *** VmBackupSnapshotDone
[2025-09-30T10:15:28.437Z] [   debug] [vmbackup] [11804] *** VmBackupSyncDriverSnapshotDone
[2025-09-30T10:15:29.438Z] [   debug] [vmbackup] [11804] *** VmBackupAsyncCallback
[2025-09-30T10:15:29.438Z] [   debug] [vmbackup] [11804] *** VmBackupPostProcessCurrentOp
[2025-09-30T10:15:29.438Z] [   debug] [vmbackup] [11804] VmBackupPostProcessCurrentOp: checking VmBackupSyncDriverSnapshotDone
[2025-09-30T10:15:29.438Z] [ warning] [vmbackup] [11804] Error opening backup manifest file /etc/vmware-tools/quiesce_manifest.xml
[2025-09-30T10:15:29.438Z] [   debug] [vmbackup] [11804] Async request 'VmBackupSyncDriverSnapshotDone' completed
[2025-09-30T10:15:29.439Z] [   debug] [vmbackup] [11804] *** VmBackupStartScripts
[2025-09-30T10:15:29.439Z] [   debug] [vmbackup] [11804] Trying to run scripts from /etc/vmware-tools/backupScripts.d
[2025-09-30T10:15:30.440Z] [   debug] [vmbackup] [11804] *** VmBackupAsyncCallback
[2025-09-30T10:15:30.440Z] [   debug] [vmbackup] [11804] *** VmBackupPostProcessCurrentOp
[2025-09-30T10:15:30.440Z] [   debug] [vmbackup] [11804] VmBackupPostProcessCurrentOp: checking VmBackupOnThaw
[2025-09-30T10:15:30.440Z] [   debug] [vmbackup] [11804] Async request 'VmBackupOnThaw' completed
[2025-09-30T10:15:30.440Z] [   debug] [vmbackup] [11804] *** VmBackupEnableCompleteWait
[2025-09-30T10:15:30.440Z] [   debug] [vmbackup] [11804] *** VmBackupFinalize
[2025-09-30T10:15:30.440Z] [   debug] [vmbackup] [11804] *** VmBackup_SendEventNoAbort
[2025-09-30T10:15:30.440Z] [   debug] [vmbackup] [11804] Sending vmbackup event: vmbackup.eventSet req.done 0 

web2wire avatar Sep 30 '25 10:09 web2wire

(NB: Starting from a fresh install, not on top of the above changes.)

The /etc/vmware-tools directory is linked into place by a dropin at /usr/lib/systemd/system/vmtoolsd.service.d/flatcar-fixups.conf

So first thing, we need to neutralize that, stop the service, and reload systemd config:

cat <<EOF > /etc/systemd/system/vmtoolsd.service.d/zzz-dont-link-etc-dir.conf
[Service]
ExecStartPre=
EOF

systemctl stop vmtoolsd
systemctl daemon-reload

Then replace /etc/vmware-tools with a writable overlay and start the service:

mkdir /opt/vmware-tools /opt/vmware-tools-work
rm /etc/vmware-tools
mount -t overlay overlay -o lowerdir=/usr/share/flatcar/oem-vmware/vmware-tools,upperdir=/opt/vmware-tools,workdir=/opt/vmware-tools-work /etc/vmware-tools

systemctl start vmtoolsd

After this, snapshots with "Quiesce guest filesystem" do show "Yes" in vSphere as expected.

Someone more familiar with all of the scripting will need to adapt this mechanism to incorporate it into the Flatcar build system to make this a permanent fix.

alliek-mti avatar Sep 30 '25 14:09 alliek-mti

The /etc/vmware-tools path is hard coded in open-vm-tools: https://github.com/vmware/open-vm-tools/blob/master/open-vm-tools/lib/guestApp/guestApp.c#L68

It's used to build the path to the quiesce_manifest.xml file here: https://github.com/vmware/open-vm-tools/blob/master/open-vm-tools/services/plugins/vmbackup/syncManifest.c#L95

No env var, no command line switch, no config file option. No way to override that without patching open-vm-tools.

My workaround with the overlay could be reworked into a systemd mount unit and then at least it would survive a reboot. The upper and work directories should probably be under /var somewhere.

alliek-mti avatar Sep 30 '25 21:09 alliek-mti

Here's a butane snippet that puts it all together. Requires systemd 256.

---
variant: flatcar
version: "1.0.0"
systemd:
  units:
    - name: etc-vmware\x2dtools.mount
      enabled: true
      contents: |
        [Unit]
        DefaultDependencies=no
        After=ensure-sysext.service
        [Mount]
        What=overlay
        Where=/etc/vmware-tools
        Type=overlay
        Options=lowerdir=/usr/share/flatcar/oem-vmware/vmware-tools,upperdir=/var/lib/vmware/vmware-tools/diff,workdir=/var/lib/vmware/vmware-tools/work
        [Install]
        WantedBy=default.target
    - name: vmtoolsd.service
      dropins:
        - name: zzz-override-flatcar-fixups.conf
          contents: |
            [Service]
            ExecStartPre=
        - name: require-vmware-tools-overlay.conf
          contents: |
            [Unit]
            After=etc-vmware\x2dtools.mount
            Requires=etc-vmware\x2dtools.mount
    - name: vgauthd.service
      dropins:
        - name: zzz-override-flatcar-fixups.conf
          contents: |
            [Service]
            ExecStartPre=
        - name: require-vmware-tools-overlay.conf
          contents: |
            [Unit]
            After=etc-vmware\x2dtools.mount
            Requires=etc-vmware\x2dtools.mount

alliek-mti avatar Oct 16 '25 19:10 alliek-mti

@alliek-mti: this is great, thank you for the investigation! If the daemon expects that path to be writeable, then we should rework how we handle /etc/vmware-tools in flatcar to make it so.

jepio avatar Oct 17 '25 09:10 jepio