Only use full backups as base for differential backups
Each diff backup should save all changes that have occurred since the last full backup.
Full → Inc₁ → Inc₂ → Diff (all changes after Full) → Inc₃ → Diff (again all changes after Full)
libvirtnbdbackup/virt/checkpoints.py
if args.level == "diff":
parentCheckpoint = checkpoints[0] #for take ONLY full backup
# ...
virtnbdrestore
dataFiles: List[str] = []
if args.sequence is not None:
logging.info("Using manual specified sequence of files.")
logging.info("Disabling redefine and config adjust options.")
args.define = False
args.adjust_config = False
dataFiles = args.sequence.split(",")
if "full" not in dataFiles[0] and "copy" not in dataFiles[0]:
logging.error("Sequence must start with full or copy backup.")
sys.exit(1)
else:
dataFiles = lib.getLatest(args.input, "*.data")
dataFiles = [f for f in dataFiles if '.diff.' not in os.path.basename(f)] #SKIP take diff for restoring by util
if not dataFiles:
logging.error("No data files (excluding diff backups) found in directory: [%s]", args.input)
sys.exit(1)
Example workflow
- Monday:
Full(original copy). - Tuesday:
Inc₁(changes afterFull). - Wednesday:
Inc₂(changes afterInc₁). - Thursday:
Diff₁(all changes fromFull, includingInc₁+Inc₂). - Friday:
Inc₃(changes afterDiff₁). - Saturday:
Diff₂(all changes from Full, includingInc₁+Inc₂+Inc₃).
Restore to Saturday:
-
Full(Monday) +Diff₂(Saturday). -
Inc₁,Inc₂,Inc₃are not required - their changes are already inDiff₂.
currently differential backup always refers back to the latest backup available in the chain. I think this is more flexible. Why backup changed data multiple times if not required? The downside of basing the differential backups on the latest full backups from what i think are:
- longer backup times
- more backup data to be saved
- same backup data being saved multiple times.
Why does the other approach make more sense?
According to official meaning so is misconcept with A differential backup is a cumulative backup of all changes made since the last full backup, i.e., the differences since the last full backup. so it can be misconcept.
So we have two ways, and they are different:
-
Incremental -
Differential
and user can choice it. It`s up to you to decide backup logic but i think it will be nice for having two different canonical ways to backup.
the only way i see is to implement at new command line parameter allowing to change behavior to not break the current workflow for existing installations.
In the metadata, there is data that technically no longer exists and inc cannot reach the required one.
The logs show that when restoring to virtnbdbackup.2, the system used:
full.data → inc.1.data → diff.data → stop.
Error: Instead of the last inc.2.data, a diff was mistakenly taken, which could refer to deleted data.
Reason: Diff backups contain metadata that depends on previous incs, which may no longer exist.
Current:
[2025-07-01 17:45:45] INFO root image - getConfig [main]: Using QCOW options from backup file: [/home/administrator/backup/TTT/vda.virtnbdbackup.2.qcow.json]
[2025-07-01 17:45:45] INFO root server - setup [main]: Starting local NBD server on socket: [/var/tmp/virtnbdbackup.95250]
[2025-07-01 17:45:45] INFO root server - setup [main]: Started NBD server, PID: [95262]
[2025-07-01 17:45:45] INFO nbd client - connect [main]: Waiting until NBD server at [nbd+unix:///vda?socket=/var/tmp/virtnbdbackup.95250] is up.
[2025-07-01 17:45:46] INFO nbd client - connect [main]: Connection to NBD backend succeeded.
[2025-07-01 17:45:46] INFO root data - _write [main]: Applying data from backup file [/home/administrator/backup/TTT/vda.full.data] to target file [/tmp/restore/TTT/TTT-vda.qcow2].
[2025-07-01 17:46:01] INFO root data - _write [main]: End of stream, [10.5GiB] of data processed
[2025-07-01 17:46:01] INFO root data - _write [main]: Applying data from backup file [/home/administrator/backup/TTT/vda.inc.virtnbdbackup.1.data] to target file [/tmp/restore/TTT/TTT-vda.qcow2].
[2025-07-01 17:46:02] INFO root data - _write [main]: End of stream, [56.4MiB] of data processed
[2025-07-01 17:46:02] INFO root data - _write [main]: Applying data from backup file [/home/administrator/backup/TTT/vda.diff.1751376757.data] to target file [/tmp/restore/TTT/TTT-vda.qcow2].
[2025-07-01 17:46:02] INFO root data - _write [main]: End of stream, [85.4MiB] of data processed
[2025-07-01 17:46:02] INFO root data - _write [main]: Reached checkpoint [virtnbdbackup.2], stopping
Corrected:
[2025-07-01 17:51:41] INFO root image - getConfig [main]: Using QCOW options from backup file: [/home/administrator/backup/TTT/vda.virtnbdbackup.2.qcow.json]
[2025-07-01 17:51:41] INFO root server - setup [main]: Starting local NBD server on socket: [/var/tmp/virtnbdbackup.95728]
[2025-07-01 17:51:41] INFO root server - setup [main]: Started NBD server, PID: [95740]
[2025-07-01 17:51:41] INFO nbd client - connect [main]: Waiting until NBD server at [nbd+unix:///vda?socket=/var/tmp/virtnbdbackup.95728] is up.
[2025-07-01 17:51:42] INFO nbd client - connect [main]: Connection to NBD backend succeeded.
[2025-07-01 17:51:42] INFO root data - _write [main]: Applying data from backup file [/home/administrator/backup/TTT/vda.full.data] to target file [/tmp/restore/TTT/TTT-vda.qcow2].
[2025-07-01 17:51:58] INFO root data - _write [main]: End of stream, [10.5GiB] of data processed
[2025-07-01 17:51:58] INFO root data - _write [main]: Applying data from backup file [/home/administrator/backup/TTT/vda.inc.virtnbdbackup.1.data] to target file [/tmp/restore/TTT/TTT-vda.qcow2].
[2025-07-01 17:51:58] INFO root data - _write [main]: End of stream, [56.4MiB] of data processed
[2025-07-01 17:51:58] INFO root data - _write [main]: Applying data from backup file [/home/administrator/backup/TTT/vda.inc.virtnbdbackup.2.data] to target file [/tmp/restore/TTT/TTT-vda.qcow2].
[2025-07-01 17:51:59] INFO root data - _write [main]: End of stream, [109.4MiB] of data processed
[2025-07-01 17:51:59] INFO root data - _write [main]: Reached checkpoint [virtnbdbackup.2], stopping
for greater flexibility, we can use command line option for virtnbdbackup utility, for example, --from virnbdbackup.0 or another checkpoint
for greater flexibility, we can use command line option for virtnbdbackup utility, for example, --from virnbdbackup.0 or another checkpoint
that could introduce a whole lot of problems if used wrongly.
I think its more about to understand what the "differential" backup means in the libvirt context.
The checkpoint xml definition for the "parent" setting as outlined in the documentation:
https://libvirt.org/formatcheckpoint.html
is read only. That means you cannot tell the libvirt daemon that a checkpoint has a specfic parent. The parent option is tracked by libvirt itself. It is not like you need a checkpoint for a full backup and a checkpoint for a "differential" backup. The change that is introduced by the bug reporter doesnt cause any real change in behavior.
You can set it, like the bug reporter does in the code, but it doesnt change anything behavior wise and is simply ignored by the checkpoint creation api.
If you use "virsh checkpoint-list --parent", libvirt will show the checkpoint that was created before as "parent", but there is no relation between the two. It also doesn't matter if the checkpoint before was used for incremental or differential backup. From the moment on the checkpoint and its bitmap are created, the bitmap will track the changes. For a differential backup, that means you dont create a new bitmap during backup but leave the last one in place, so it accumulates the changes until another backup is created that does create an new bitmap. That is by design.
So during differential backup, the backup utility does not create a new checkpoint/bitmap, but saves the current data since the time the last bitmap has been created, leaving it in place. It doesnt matter for what type of backup the checkpoint before was used. Libvit does not do a "diff" between two checkpoints. Its a bitmap that is tracking continously, until you create a new one.
In fact, it is not even necessary to create a new checkpoint for each backup, the code could use one bitmap all over and simply clear it. But i figured it is saner to have a checkpoint "history", like for example, i tend to think at some point i might introduce code into virtnbdrestore to create the checkpoints during restore again, so a restored virtual machine could be backed up from the last point on, and doesnt need a full backup again.
So in short:
- an "incremental" backup in the libvirt sense, will create a new checkpoint/bitmap.
- an differential backup wont, so the last created checkpoint/bitmap accumulates all changes until a new full or incremental backup is executed.
In this context, a "differential" backup may not really be what the wikipedia definition tells it has to. It may be the wording is not optimal.
Each diff backup should save all changes that have occurred since the last full backup.
if args.level == "diff": parentCheckpoint = checkpoints[0] #for take ONLY full backup...
as outlined above, this setting is meaningless and doesnt change behavior libvirt wise. To facilitate a real diff backup, as mentioned, the backup utility would have to go through all checkpoints/bitmaps (if they still exist and are valid) and backup these again, without creating a new bitmap, and im not quite sure how much sense it makes to implement (in case the backup chain is mixed with incremental backups, the data has been backed up already anyways)
Maybe a better meaning for "-l diff" or "differential backup" in this context would be "-l latest" or "changes since the last backup that has been executed"