wyng-backup
wyng-backup copied to clipboard
Fast backups for logical volumes & disk images
Wyng
Fast incremental backups for logical volumes.
Introduction
Wyng is able to deliver faster, more efficient incremental backups for logical volumes. It accesses logical volume metadata (instead of re-scanning data over and over) to instantly find which data has changed since the last backup. Combined with its Time Machine style storage format, Wyng can also prune older backups from the archive very quickly, meaning you only ever have to do a full backup once and can send incremental backups to the same archive indefinitely and frequently.
Speed - Having nearly instantaneous access to volume changes and a nimble archival format enables backing up even terabyte-sized volumes multiple times per hour with little impact on system resources.
Efficiency - Wyng sends data as streams whenever possible, which avoids writing temporary caches of data to disk. And Wyng's ingenious snapshot monitoring avoids common aging snapshot space consumption pitfalls.
Security - Wyng doesn't require the source admin system to ever mount processed volumes, so it safely handles untrusted data in guest filesystems.
Status
Public release v0.3 with a range of features including:
-
Incremental backups of Linux thin-provisioned LVM volumes
-
Supported destinations: Local filesystem, VM or SSH host (plus S3 & SFTP with FUSE)
-
Send, receive, verify and list contents
-
Fast pruning of old backup sessions
-
Basic archive management such as add/delete volume and auto-pruning
-
Data deduplication
-
Marking and selecting snapshots with user-defined tags
Integrated encryption and key-based verification are under development (see notes below for ways to add encryption or try the v0.4 alpha branch).
Wyng is released under a GPL license and comes with no warranties expressed or implied.
Requirements & Setup
Before starting:
-
Thin-provisioning-tools, lvm2, and python >=3.5.4 must be present on the source system. For top performance, the
python3-zstd
package should be installed before creating an archive. -
The destination system should have a Unix-type filesystem with a robust inode implementation (i.e. Ext4 or other fairly recent fs).
-
Volumes to be backed-up must reside in an LVM thin-provisioned pool.
Wyng is currently distributed as a single Python executable with no complex supporting modules or other program files; it can be placed in '/usr/local/bin' or another place of your choosing.
Archives are created with wyng arch-init
:
# wyng arch-init --local=vg/pool --dest=ssh://[email protected]/mnt/bkdrive
...or...
# wyng arch-init --local=vg/pool --dest=internal:/ --subdir=home/user
Wyng 0.3.0 release 20220104
Compression = zstd:7
Hashing = blake2b
Done.
# wyng send my_big_volume
Wyng 0.3.0 release 20220104
Preparing snapshots...
Pairing snapshot for my_big_volume
Sending backup session 20220104-133430 to ssh://[email protected]
100% 859.7M | my_big_volume
The --dest
argument always ends in a mountpoint (mounted volume) absolute path.
In the second example, the destination system has no unique mountpoint in the
desired backup path, so --dest
ends with the root '/' path and the --subdir
argument is supplied to complete the archive path.
The destination mountpoint is automatically checked to make sure its mounted
before executing certain Wyng commands
including send
, receive
, verify
, delete
and prune
.
(See the arch-init
summary below for more details.)
Operation
Run Wyng using the following commands and arguments in the form of:
wyng <parameters> command [volume_name]
Please note that dashed parameters are always placed before the command.
Command summary
Command | Description |
---|---|
list [volume_name] | List volumes or volume sessions. |
send [volume_name] | Perform a backup of enabled volumes. |
receive volume_name | Restore a volume from the archive. |
verify volume_name | Verify a volume against SHA-256 manifest. |
prune [volume_name] | Remove older backup sessions to recover archive space. |
monitor | Collect volume change metadata & rotate snapshots. |
diff volume_name | Compare local volume with archived volume. |
add volume_name | Add a volume to the configuration. |
delete volume_name | Remove entire volume from config and archive. |
rename vol_name new_name | Renames a volume in the archive. |
arch-init | Initialize archive configuration. |
arch-check [volume_name] | Thorough check of archive data & metadata |
arch-delete | Remove data and metadata for all volumes. |
arch-deduplicate | Deduplicate existing data in archive. |
version | Print the Wyng version and exit. |
Parameters / Options summary
Option | Description |
---|---|
--session=date-time[,date-time] | Select a session or session range by date-time or tag (receive, verify, prune). |
--keep=date-time | Specify date-time or tag of sessions to keep (prune). |
--all-before | Select all sessions before the specified --session date-time (prune). |
--autoprune=off | Automatic pruning by calendar date. (experimental) |
--save-to=path | Save volume to path (receive). |
--sparse | Receive volume data sparsely (implies --sparse-write) |
--sparse-write | Overwrite local data only where it differs (receive) |
--remap | Remap volume during send or diff . |
--from=type:location | Retrieve from a specific unconfigured archive (receive, verify, list, arch-init). |
--local=vg/pool | (arch-init) Pool containing local volumes. |
--dest=type:location | (arch-init) Destination of backup archive. |
--subdir=dirname | Optional subdirectory below mountpoint (--from, --dest) |
--compression | (arch-init) Set compression type:level. |
--hashtype | (arch-init) Set hash algorithm: sha256 or blake2b. |
--chunk-factor | (arch-init) Set archive chunk size. |
--dedup | Use deduplication for send (see notes). |
--clean | Perform garbage collection (arch-check) or medata removal (delete). |
--meta-dir=path | Use a different metadata dir than the default. |
--volex=volname[,*] | Exclude volumes (send, monitor, list, prune). |
--force | Needed for arch-delete. |
--verbose | Increase details. |
--quiet | |
-u, --unattended | Don't prompt for interactive input. |
--tag=tagname[,desc] | Use session tags (send, list). |
send
Performs a backup by sending volume data to a new archive session; this is always incremental
unless it is the first time a volume is being sent. Each session under an archival volume represents
the entire contents of the source volume at that time, even if only changed data is sent.
All volumes that were added to the archive will be included unless volume names are specified or
--volex
is used to exclude volumes.
wyng send
Note: A send
operation may refuse to backup a volume if there is not enough space on the
destination. One way to avoid this situation is to specify --autoprune=on
which
will cause Wyng to remove older backup sessions from the archive when space is needed.
receive
Retrieves a volume snapshot (using the latest session ID
if --session
isn't specified) from the archive and saves it to either the volume's
original path or the path specified
with --save-to
. If --session
is used, only one date-time or tag is accepted. The volume
name is required.
wyng --save-to=myfile.img receive vm-work-private
...restores a volume called 'vm-work-private' to 'myfile.img' in the current folder.
Its possible to specify any valid file path or block device. However, note that '/dev/vgname/lvname' is a special form that indicates you are saving to an LVM volume; Wyng will only auto-create LVs for you if the save-to path is specified this way. For any save path, Wyng will try to discard old data before receiving.
Emergency and Recovery situations: The --from
option may be used to
receive from any Wyng archive that is not currently configured in the current
system. It is specified just like the --dest
option of arch-init
, and the
--local
option may also be added to override the LVM settings:
wyng --from=ssh://[email protected]/mountpoint receive my-volume
verify
The verify
command is similar to receive
without saving the data. For both
receive
and verify
modes, an error will be reported with a non-zero exit
code if the received data does not pass integrity checks.
prune
Quickly reclaims space on a backup drive by removing any prior backup session you specify; it does this without re-writing data blocks or compromising volume integrity.
To use, supply a single exact date-time in YYYYMMDD-HHMMSS format to remove a specific session, or two date-times representing a range:
wyng --session=20180605-000000,20180701-140000 prune
...removes backup sessions from midnight on June 5 through 2pm on July 1 for all
volumes. Alternately, --all-before
may be used with a single --session
date-time
to prune all sessions prior to that time.
If volume names aren't specified, prune
will operate across all
enabled volumes.
The --keep
option can accept a single date-time or a tag in the form ^tagID
.
Matching sessions will be excluded from pruning and autopruning.
Less Commonly-used Commands
monitor
Frees disk space that is cumulatively occupied by aging LVM
snapshots, thereby addressing a common resource usage issue with snapshot-based
backups. After harvesting their change metadata, the older snapshots are replaced with
new ones. Running monitor
isn't strictly necessary, but it only takes a few seconds
and is good to run on a frequent, regular basis if you have some volumes that are
very active. Volume names may also be
specified if its desired to monitor only certain volumes.
This rule in /etc/cron.d runs monitor
every 20 minutes:
*/20 * * * * root su -l -c '/usr/local/bin/wyng monitor'
diff
wyng diff vm-work-private
Compare a local volume snapshot with the archive and report any differences.
This is useful for diagnostics and can also be useful after a verification
error has occurred. The --remap
option will record any differences into the
volume's current change map, resulting in those blocks being backed-up on
the next send
.
add
wyng add vm-untrusted-private
Adds a new entry to the list of volumes configured for backup. Volume will be backed up
by future send
commands.
delete
wyng delete vm-untrusted-private
Removes a volume's wyng-managed snapshots, config and metadata from the source system and all of its data from the destination archive (everything deleted except the source volume). Use with caution!
An alternate form of delete
will remove all Wyng archive-related metadata (incl. snapshots) from the
local system without affecting the archive on the destination:
wyng delete --clean
Alternately, using delete --clean --all
will remove all Wyng metadata from the local system, including
snapshots from any Wyng archive (not just the currently configured archive).
rename
wyng rename oldname newname
Renames a volume 'oldname' in the archive to 'newname'. Note: This will rename only the archive volume, not your source volume.
arch-deduplicate
De-duplicates the entire archive by removing repeating patterns. This can save space on the destination's drive while keeping the archived volumes intact.
De-duplication can also be performed on an ongoing basis by using --dedup
with send
.
wyng --dedup arch-deduplicate
arch-init
Initialize a new backup archive configuration...
wyng --local=myvg/mypool --dest=internal:/mountpoint arch-init
Initialize a new backup archive with storage parameters...
wyng --local=myvg/mypool --dest=internal:/mpoint --chunk-factor=3 --hashtype=blake2b arch-init
Import a configuration from an existing archive...
wyng --from=internal:/mountpoint arch-init
arch-check
Intensive check of archive integrity, reading each session completely starting with
the newest and working back to the oldest. This differs from verify
which first bulids a complete
index for the volume and then checks only/all data referenced in the index.
Using --session=newest
provides a 'verify the last session' function (useful after an incremental
backup). Otherwise, supplying a date-time will make arch-check
start the check from that point and
then continue working toward the oldest session. Session ranges are not yet supported.
Depending on how arch-check
is used, the verification process can be shorter or much longer
than using verify
as the latter is always the size of a volume snapshot. The longest, most
complete form arch-check
is to supply no parameters, which checks all sessions in all volumes.
arch-delete
Deletes the entire archive on the destination, and all data that was saved in it; also removes archive metadata from the source system. Use with caution!
wyng --force arch-delete
Options/Parameters for arch-init
--local
takes the source volume group and pool as 'vgname/poolname' for the arch-init
command.
These LVM objects don't have to exist before using arch-init
but they will
have to be there before using send
.
--dest
when using arch-init
, describes the location where backups will be sent.
It accepts one of the following forms, always ending in a mountpoint path:
Note: --local and --dest are required if not using --from.
--from
accepts a URL like --dest
, but retrieves the configuration from an existing archive.
This imports the archive's configuration and can permanently save it as the
local configuration. This option can also be used with: list, receive and verify commands.
Note: You can override the archive's LVM settings by specifying --local
.
URL Form | Destination Type |
---|---|
internal:/path | Local filesystem |
ssh://[email protected]/path | SSH server |
qubes://vm-name/path | Qubes virtual machine |
qubes-ssh://vm-name:[email protected]/path | SSH server via a Qubes VM |
--subdir
In conjunction with --dest
or --from
, allows you to specify a subdirectory
below the mountpoint.
--compression=zstd:3
accepts the form type
or type:level
. The three types available are
the default zstd
, plus zlib
and bz2
. Note that Wyng will only default to zstd
when the
'python3-zstd' package is installed; otherwise it will fall back to the less capable zlib
.
--hashtype=blake2b
accepts a value of either 'sha256' or 'blake2b' (the default).
The digest size used for blake2b is 256 bits. Note that with Python 3.5 the hashtype will
fall back to sha256 as blake2b was introduced in Python 3.6.
--chunk-factor=1
sets the pre-compression data chunk size used within the destination archive.
Accepted range is an integer exponent from '1' to '6', resulting in a chunk size of 64kB for
factor '1', 128kB for factor '2', 256kB for factor '3' and so on. To maintain a good
space efficiency and performance balance, a factor of '2' or greater is suggested for archives
that will store volumes larger than about 100GB.
Note that compression, hashtype and chunk-factor cannot be changed for an archive once it is initialized.
Options
--session=<date-time>[,<date-time>]
OR
--session=^<tag>[,^<tag>]
Session allows you to specify a single date-time or tag spec for thereceive
, verify
, diff
,
and arch-check
commands. Using a tag selects the last session having that tag. When specifying
a tag, it must be prefixed by a ^
carat.
For prune
, specifying
a tag will have different effects: a single spec using a tag will remove only each individual session
with that tag, whereas a tag in a dual (range) spec will define an inclusive range anchored at the first
instance of the tag (when the tag is the first spec) or the last instance (when the tag is the
second range spec). Also, date-times and tags may be used together in a range spec.
--volex=<volume>[,volume,*]
Exclude one or more volumes from processing. May be used with commands that operate on multiple
volumes in a single invocation, such as send
.
--sparse-write
Used with receive
, this option does not prevent Wyng from overwriting existing local volumes!
The sparse-write mode merely tells Wyng not to create a brand-new local volume for receive
, and
results in the data being sparsely written into the volume instead. This is useful if the existing
local volume is a clone/snapshot of another volume and you wish to save local disk space. It is also
best used when the backup/archive storage is local (i.e. fast USB drive or similar) and you don't
want the added CPU usage of full --sparse
mode.
--sparse
The sparse mode can be used with the receive
command to intelligently overwrite an existing
local volume so that only the differences between the local and archived volumes will be fetched
from the archive and written to the local volume. This results in reduced remote disk and network
usage while receiving at the expense of some extra CPU usage on the local machine, and also uses
less local disk space when snapshots are a factor (implies '--sparse-write`).
--dedup
When used with the send
command, data chunks from the new backup will be sent only if
they don't already exist somewhere in the archive. If its a duplicate, the chunk will be
linked instead of sent and stored, saving disk space and possibly time and bandwith.
The tradeoff for deduplicating is longer startup time for Wyng, in addition to using more
memory and CPU resources during backups. Using --dedup
works best if you are backing-up
multiple volumes that have a lot of the same content and/or you are backing-up over a slow
Internet link.
--autoprune=(off | on | min | full)
(experimental)
Autoprune may be used with either the prune
or send
commands and will cause Wyng to
automatically remove older backup sessions according to date criteria. When used with send
specifically, the autopruning process will only be triggered if the destination filessytem is
low on free space.
The criteria are currently hard-coded to remove all sessions older than 366 days, and to thin-out the number of sessions older than 32 days down to a rate of 2 sessions every 7 days. In the future these parameters can be reconfigured by the user.
Selectable modes are:
off is the current default.
on removes more sessions than min as space is needed, while trying to retain any/all older sessions whenever available storage space allows.
min removes sessions before the 366 day mark, but no thinning-out is performed.
full removes all sessions that are due to expire according to above criteria.
--tag=<tagname[,description]>
With send
, attach a tagname of your choosing to the new backup session/snapshot; this may be
repeated on the command line to add multiple tags. Specifying an empty '' tag will cause Wyng
to ask for one or more tags to be manually input; this also causes list
to display tag
information when listing sessions.
Tips
-
Its recommended to avoid backing up sensitive data to untrusted storage -- exercise caution and add encryption where necessary.
-
To reduce the size of incremental backups it may be helpful to remove cache files, if they exist in your source volume(s). Typically, the greatest cache space consumption comes from web browsers, so volumes holding paths like /home/user/.cache can impacted by this, depending on the amount and type of browser use associated with the volume. Three possible approaches are to clear caches on browser exit, delete /home/user/.cache dirs on system/container shutdown (this reasonably assumes cached data is expendable), or to mount .cache on a separate volume that is not configured for backup.
-
Another factor in space/bandwidth use is how sparse your source volumes are in practice. Therefore it is best that the
discard
option is used when mounting your volumes for normal use. -
The chunk size that your LVM thin pool was initialized with can also affect disk space and I/O used when sending backups. Larger LVM chunk sizes can mean larger incremental backups for volumes with lots of random writes. To see the chunksize for your pool(s) run
sudo lvs -o name,chunksize
. Common sizes are 128-512kB so if random writes are prevalent (i.e. for large databases or mail archives) then using Wyng deduplication (which resolves at 64kB by default) can reduce the size of your backup sessions.
Troubleshooting notes
- Backup sessions shown in
list
output may be seemingly (but not actually) out of order if the system's local time shifts substantially between backups, such as when moving between time zones (including DST). If this results in undesired selections with--session
ranges, its possible to nail down the precisely desired range by observing the output oflist volumename
and using exact date-times from the listing.
Encryption options
Wyng is slated to integrate encryption in the future. In the meantime, here are some encryption approaches you can use to secure your backup archives:
-
Regular Linux systems:
Many options exist for mounting an encrypted filesystem on a local backup drive. Some examples you'll find use
gnome-disks
to format a partition as Ext4 on LUKS or VeraCrypt, or they use encrypted filesystems like Encfs or Cryfs. These create a local filesystem mountpoint, so configuring Wyng with an 'internal:/path' destination will suffice.For remote backups on untrusted servers, use one of the above encryption options on a shared folder (Encfs, Cryfs) or disk image file (LUKS, VeraCrypt).
For remote backups where the server is trusted (i.e. encrypted and secured) it is possible to forgo setup of encrypted storage on your local computer and just specify 'ssh://user@address/path' for your Wyng destination.
-
Virtualized host systems using Xen, KVM or other hypervisors:
Option A) From the admin/storage VM, setup Wyng with an 'ssh://' destination where you wish to store the archive. This destination may be a local guest VM or a remote server.
Option B) For hypervisors that support attachment of block devices to different VMs: An encrypted block dev can be attached directly to the admin/storage VM where it is then decrypted and mounted. This requires only an 'internal:/path' destination and benefits from not trusting a guest VM or remote server with handling encryption, but performance may be slower due to filesystem-network overhead.
-
Qubes OS: A brief description for dom0-encrypted remote storage from a Qubes laptop:
-
Qube remotefs runs
sshfs
or other file sharing to access a remote filesystem and thenlosetup
on a remote file (to size the file correctly during inital setup, usetruncate -s <size> mydisk.img
before usinglosetup
). -
Domain0 runs
qvm-block attach dom0 remotefs:loop0
. -
Domain0 runs
cryptsetup
on /dev/xvdi to create/access the volume in its encrypted form. Finally, the resulting /dev/mapper device can be mounted for use. -
Setup Wyng on Domain0 with
--dest=internal:/path
pointing to the mounted path.
As an alternative to the above, if you have a trusted backup qube handling encryption, you can easily setup Wyng in dom0 with a 'qubes://vm-name/path' destination. Also, for Qubes OS where you have both a trusted backup VM and trusted server, you can backup to the server via the backup VM with a 'qubes-ssh://vm-name:user@address/path' destination. These qubes options can achieve faster performance than the above
qvm-block attach
setup, but they move archive encryption out of Domain 0.Local USB storage is relatively simple: Attach the drive to dom0 then encrypt/decrypt it and then mount it.
-
Authentication options
Some of the above encryption methods have [options](https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/dm-integrity.html) that enable authentication.
Wyng archives can also be authenticated by signing the metadata. For example:
$ # Sign #
$ find /var/lib/wyng.backup -name archive.ini -o -name manifest -o -name '*info' | xargs b2sum >hashes
$ gpg --sign hashes
$ # Verify #
$ gpg --verify hashes && b2sum -c hashes && echo Archive OK.
Donations
If you like Wyng or my other efforts, monetary contributions are welcome and can be made through Liberapay or Buy Me a Coffee.
External Links
Some other tools that use LVM metadata:
lvmsync (ruby). Synchronize logical volumes.
lvm-thin-sendrcv (java). Synchronize logical volumes.
thinp-test-suite (ruby). POC backup program.