virtnbdbackup icon indicating copy to clipboard operation
virtnbdbackup copied to clipboard

Only use full backups as base for differential backups

Open SL1dee36 opened this issue 8 months ago • 7 comments

Each diff backup should save all changes that have occurred since the last full backup.

Full → Inc₁ → Inc₂ → Diff (all changes after Full) → Inc₃ → Diff (again all changes after Full)

libvirtnbdbackup/virt/checkpoints.py

    if args.level == "diff":
        parentCheckpoint = checkpoints[0] #for take ONLY full backup
   # ...

virtnbdrestore

    dataFiles: List[str] = []
    if args.sequence is not None:
        logging.info("Using manual specified sequence of files.")
        logging.info("Disabling redefine and config adjust options.")
        args.define = False
        args.adjust_config = False
        dataFiles = args.sequence.split(",")

        if "full" not in dataFiles[0] and "copy" not in dataFiles[0]:
            logging.error("Sequence must start with full or copy backup.")
            sys.exit(1)
    else:
        dataFiles = lib.getLatest(args.input, "*.data")
        dataFiles = [f for f in dataFiles if '.diff.' not in os.path.basename(f)]  #SKIP take diff for restoring by util
        if not dataFiles:
            logging.error("No data files (excluding diff backups) found in directory: [%s]", args.input)
            sys.exit(1)

Example workflow

  • Monday: Full (original copy).
  • Tuesday: Inc₁ (changes after Full).
  • Wednesday: Inc₂ (changes after Inc₁).
  • Thursday: Diff₁ (all changes from Full, including Inc₁ + Inc₂).
  • Friday: Inc₃ (changes after Diff₁).
  • Saturday: Diff₂ (all changes from Full, including Inc₁ + Inc₂ + Inc₃).

Restore to Saturday:

  • Full (Monday) + Diff₂ (Saturday).

  • Inc₁, Inc₂, Inc₃ are not required - their changes are already in Diff₂.

SL1dee36 avatar Jul 01 '25 14:07 SL1dee36

currently differential backup always refers back to the latest backup available in the chain. I think this is more flexible. Why backup changed data multiple times if not required? The downside of basing the differential backups on the latest full backups from what i think are:

  • longer backup times
  • more backup data to be saved
  • same backup data being saved multiple times.

Why does the other approach make more sense?

abbbi avatar Jul 01 '25 14:07 abbbi

According to official meaning so is misconcept with A differential backup is a cumulative backup of all changes made since the last full backup, i.e., the differences since the last full backup. so it can be misconcept.

So we have two ways, and they are different:

  • Incremental
  • Differential

and user can choice it. It`s up to you to decide backup logic but i think it will be nice for having two different canonical ways to backup.

SL1dee36 avatar Jul 01 '25 14:07 SL1dee36

the only way i see is to implement at new command line parameter allowing to change behavior to not break the current workflow for existing installations.

abbbi avatar Jul 01 '25 14:07 abbbi

In the metadata, there is data that technically no longer exists and inc cannot reach the required one.

The logs show that when restoring to virtnbdbackup.2, the system used:

full.data → inc.1.data → diff.data → stop.

Error: Instead of the last inc.2.data, a diff was mistakenly taken, which could refer to deleted data.

Reason: Diff backups contain metadata that depends on previous incs, which may no longer exist.


Current:

[2025-07-01 17:45:45] INFO root image - getConfig [main]:  Using QCOW options from backup file: [/home/administrator/backup/TTT/vda.virtnbdbackup.2.qcow.json]
[2025-07-01 17:45:45] INFO root server - setup [main]:  Starting local NBD server on socket: [/var/tmp/virtnbdbackup.95250]
[2025-07-01 17:45:45] INFO root server - setup [main]:  Started NBD server, PID: [95262]
[2025-07-01 17:45:45] INFO nbd client - connect [main]:  Waiting until NBD server at [nbd+unix:///vda?socket=/var/tmp/virtnbdbackup.95250] is up.
[2025-07-01 17:45:46] INFO nbd client - connect [main]:  Connection to NBD backend succeeded.
[2025-07-01 17:45:46] INFO root data - _write [main]:  Applying data from backup file [/home/administrator/backup/TTT/vda.full.data] to target file [/tmp/restore/TTT/TTT-vda.qcow2].
[2025-07-01 17:46:01] INFO root data - _write [main]:  End of stream, [10.5GiB] of data processed
[2025-07-01 17:46:01] INFO root data - _write [main]:  Applying data from backup file [/home/administrator/backup/TTT/vda.inc.virtnbdbackup.1.data] to target file [/tmp/restore/TTT/TTT-vda.qcow2].
[2025-07-01 17:46:02] INFO root data - _write [main]:  End of stream, [56.4MiB] of data processed
[2025-07-01 17:46:02] INFO root data - _write [main]:  Applying data from backup file [/home/administrator/backup/TTT/vda.diff.1751376757.data] to target file [/tmp/restore/TTT/TTT-vda.qcow2].
[2025-07-01 17:46:02] INFO root data - _write [main]:  End of stream, [85.4MiB] of data processed
[2025-07-01 17:46:02] INFO root data - _write [main]:  Reached checkpoint [virtnbdbackup.2], stopping

Corrected:

[2025-07-01 17:51:41] INFO root image - getConfig [main]:  Using QCOW options from backup file: [/home/administrator/backup/TTT/vda.virtnbdbackup.2.qcow.json]
[2025-07-01 17:51:41] INFO root server - setup [main]:  Starting local NBD server on socket: [/var/tmp/virtnbdbackup.95728]
[2025-07-01 17:51:41] INFO root server - setup [main]:  Started NBD server, PID: [95740]
[2025-07-01 17:51:41] INFO nbd client - connect [main]:  Waiting until NBD server at [nbd+unix:///vda?socket=/var/tmp/virtnbdbackup.95728] is up.
[2025-07-01 17:51:42] INFO nbd client - connect [main]:  Connection to NBD backend succeeded.
[2025-07-01 17:51:42] INFO root data - _write [main]:  Applying data from backup file [/home/administrator/backup/TTT/vda.full.data] to target file [/tmp/restore/TTT/TTT-vda.qcow2].
[2025-07-01 17:51:58] INFO root data - _write [main]:  End of stream, [10.5GiB] of data processed
[2025-07-01 17:51:58] INFO root data - _write [main]:  Applying data from backup file [/home/administrator/backup/TTT/vda.inc.virtnbdbackup.1.data] to target file [/tmp/restore/TTT/TTT-vda.qcow2].
[2025-07-01 17:51:58] INFO root data - _write [main]:  End of stream, [56.4MiB] of data processed
[2025-07-01 17:51:58] INFO root data - _write [main]:  Applying data from backup file [/home/administrator/backup/TTT/vda.inc.virtnbdbackup.2.data] to target file [/tmp/restore/TTT/TTT-vda.qcow2].
[2025-07-01 17:51:59] INFO root data - _write [main]:  End of stream, [109.4MiB] of data processed
[2025-07-01 17:51:59] INFO root data - _write [main]:  Reached checkpoint [virtnbdbackup.2], stopping

SL1dee36 avatar Jul 01 '25 15:07 SL1dee36

for greater flexibility, we can use command line option for virtnbdbackup utility, for example, --from virnbdbackup.0 or another checkpoint

rmk177 avatar Jul 01 '25 18:07 rmk177

for greater flexibility, we can use command line option for virtnbdbackup utility, for example, --from virnbdbackup.0 or another checkpoint

that could introduce a whole lot of problems if used wrongly.

I think its more about to understand what the "differential" backup means in the libvirt context.

The checkpoint xml definition for the "parent" setting as outlined in the documentation:

https://libvirt.org/formatcheckpoint.html

is read only. That means you cannot tell the libvirt daemon that a checkpoint has a specfic parent. The parent option is tracked by libvirt itself. It is not like you need a checkpoint for a full backup and a checkpoint for a "differential" backup. The change that is introduced by the bug reporter doesnt cause any real change in behavior.

You can set it, like the bug reporter does in the code, but it doesnt change anything behavior wise and is simply ignored by the checkpoint creation api.

If you use "virsh checkpoint-list --parent", libvirt will show the checkpoint that was created before as "parent", but there is no relation between the two. It also doesn't matter if the checkpoint before was used for incremental or differential backup. From the moment on the checkpoint and its bitmap are created, the bitmap will track the changes. For a differential backup, that means you dont create a new bitmap during backup but leave the last one in place, so it accumulates the changes until another backup is created that does create an new bitmap. That is by design.

So during differential backup, the backup utility does not create a new checkpoint/bitmap, but saves the current data since the time the last bitmap has been created, leaving it in place. It doesnt matter for what type of backup the checkpoint before was used. Libvit does not do a "diff" between two checkpoints. Its a bitmap that is tracking continously, until you create a new one.

In fact, it is not even necessary to create a new checkpoint for each backup, the code could use one bitmap all over and simply clear it. But i figured it is saner to have a checkpoint "history", like for example, i tend to think at some point i might introduce code into virtnbdrestore to create the checkpoints during restore again, so a restored virtual machine could be backed up from the last point on, and doesnt need a full backup again.

So in short:

  • an "incremental" backup in the libvirt sense, will create a new checkpoint/bitmap.
  • an differential backup wont, so the last created checkpoint/bitmap accumulates all changes until a new full or incremental backup is executed.

In this context, a "differential" backup may not really be what the wikipedia definition tells it has to. It may be the wording is not optimal.

abbbi avatar Jul 09 '25 18:07 abbbi

Each diff backup should save all changes that have occurred since the last full backup.

if args.level == "diff":
    parentCheckpoint = checkpoints[0] #for take ONLY full backup

...

as outlined above, this setting is meaningless and doesnt change behavior libvirt wise. To facilitate a real diff backup, as mentioned, the backup utility would have to go through all checkpoints/bitmaps (if they still exist and are valid) and backup these again, without creating a new bitmap, and im not quite sure how much sense it makes to implement (in case the backup chain is mixed with incremental backups, the data has been backed up already anyways)

Maybe a better meaning for "-l diff" or "differential backup" in this context would be "-l latest" or "changes since the last backup that has been executed"

abbbi avatar Jul 09 '25 18:07 abbbi