btrbk
btrbk copied to clipboard
Usage with cloud storage like Amazon S3 or Glacier
I would like to set up a BTRFS filesystem with backups, and was happy to find this project. I would like to have the backups sent to a cloud storage solution, however, instead of a hard drive or SSH server. Most of these cloud storage solutions expose RESTful APIs, and you don't have control over the storage medium they use on their end.
Does btrbk support sending backups to an arbitrary REST interface?
Does btrbk support sending backups to an arbitrary REST interface?
No, this is neither implemented nor planned. If you want to push "target raw" backups to your amazon s3 storage, you need to somehow mount it locally. You could use s3fs for this, which should do exactly that. So your setup could be something like this:
- mount amazon s3 using s3fs to /mnt/mys3drive
- configure
target raw /mnt/mys3drive/btrbk_backups/...
in btrbk.conf
If you get this working, please post a note here, so that I could add a section for this on the FAQ.
Thanks for the reply! Here's what ended up working for me. AWS Block Storage is the same price range per gigabyte as S3, so I created a block storage device and formatted it as BTRFS. I connected the block storage device to a "nano" head node, whose only job is to run btrbk. This setup gives me 500 GB of backup storage for about US$20/mo.
Amazon S3 is quite pricey if you are looking only for long-term archival. However, services such as Amazon Glacier don't seem to be easily mountable. It would be convenient if btrbk
provided a target type for piping incremental backups into arbitrary commands. Think
volume /mnt/btr_pool
subvolume home
target pipe /usr/bin/glacier archive upload my_vault --name={} -
where {}
would expand to the name of the file which is being passed on stdin and where the /usr/bin/glacier
command originates from basak/glacier-cli. It seems trivial to just add
btrbk run && btrfs send -p `find snapshot_dir/ -mindepth 1 -maxdepth 1 | tail -2` |
(insert a compression and encryption pipeline) |
glacier archive upload my_vault --name=`ls snapshot_dir | tail -1`.btrfs -
to one's crontab and be done with it, but then you also need to keep a journal of unsuccessful uploads (due to the machine being offline for example), so that everything gets backed up eventually. This is not an unsurmountable task, but direct support for this kind of usage in btrbk
would definitely be welcome.
This is a nice idea, but it's incomplete: As btrbk is stateless, it always needs information of which subvolumes are already present on the target side. For target send-receive
, this information is fetched by btrfs subvolume list
; for target raw
, the uuid's are encoded in the filename.
In order to complete this, we should define some data structure: timestamp, UUID, received-UUID, parent-UUID (similar to btrfs subvolume list
), and then also have a user-defined command which would generate it. Then btrbk would parse this data and figure out which subvolumes needs to be sent to the target by the configured target_preserve policy, and which parents to pick for incremental send.
PS: sorry for the late reply, I'm really busy with other things at the moment...
My original idea was that btrbk
would be keeping tabs on the successful invocations to automatically infer which volumes need sending. If /usr/bin/glacier archive upload my_vault --name={} -
from my example returned with a zero exit code, btrbk
would put down {}
to a list. Note that the user could specify where they want this list stored:
volume /mnt/btr_pool
subvolume home
target pipe /usr/bin/glacier archive upload my_vault --name={} -
journal /var/lib/btrbk/glacier
Deleted subvolumes could be removed from the list, so that it does not grow ad infinitum.
Yeah well, but then people start deleting files on the target by hand, and the mess with the journal starts...
I guess glacier also provides some sort of directory listing, so if btrbk would generate filenames the same way as it does for target raw
, we could always fetch them and parse them the same way.
volume /mnt/btr_pool
subvolume home
target pipe /usr/bin/glacier archive upload my_vault --name={} -
list_cmd /usr/bin/glacier <insert list command here> my_vault
That would be /usr/bin/glacier archive list my_vault
in this case. However, my idea was that the pipe
target would be a fire-and-forget kind of a thing. If the user wants to start deleting data from the target, that is not our problem. Suppose I am just piping the data to a mail transfer agent over SMTP, or to a remote shell; I may well not be able to report on what is stored on “the other side”. I find this concept more flexible than what you propose.
P.S.: I guess target pipe
is a little confusing name, as it implies that the target is a named pipe. Both target command
and target pipeline
resolve this ambiguity.
However, my idea was that the pipe target would be a fire-and-forget kind of a thing
Yes I understand, and I see the benefit in this, but that's not how btrbk works. Maybe we could introduce a new sub-command for this kind of thing, something like btrbk oneshot
, which would simply create a new snapshot and transfer it (always non-incremental) to the target. The main problem here would be to keep the config consistent and non-confusing. Maybe something like this:
volume /mnt/btr_pool
subvolume home
target pipe /usr/bin/glacier archive upload my_vault --name={} -
target_type oneshot
and transfer it (always non-incremental) to the target.
Note that keeping a journal would make it possible to transfer incremental backups even in this setting.
s3fs
I've been trying to get this to work. There are a number of issues.
- fuse is an operational burden, and docker doesn't help.
- fuse in a docker container requires
--cap-add SYS_ADMIN --device /dev/fuse
, even if it's not exposed outside the container: https://github.com/docker/for-linux/issues/321 - exposing fuse across containers requires special host config (
mount --make-shared
) - if a fuse app shuts down uncleanly, then its mountpoint becomes broken and requires a manual
umount
before it can be used again. Docker does not clean this up automatically. - It's not clear fuse issues will be resolved, because it's an inherent design mismatch. Requiring admin access and special configuration to do network storage is a non-starter.
- fuse in a docker container requires
- s3fs is not production-quality
- After weeks of testing, I haven't been able to use it to upload large files.
- The latest release is broken:
- https://github.com/s3fs-fuse/s3fs-fuse/issues/1941
- https://github.com/s3fs-fuse/s3fs-fuse/issues/1936
- Older releases don't support
-o enable_content_md5
, which is required for backblaze b2, and possibly others
- The latest release is broken:
- s3fs' cache options do not play well with btrbk
- s3fs has a metadata cache, but
cat /s3/file; cat /s3/file
will still issue twoHeadObject
requests. This is bad with btrbk as it reads all the*.info
files on a raw target on every run. - s3fs will cache huge amounts of data to disk during file uploads, rather than streaming them
- s3fs has a metadata cache, but
- It's not clear s3fs issues will be resolved. Its codebase is undocumented, has heavy copy-paste duplication, uses non-meaningful naming schemes, and interlaces high-level business logic with utility functions. A large portion of it is dedicated to complex ad-hoc manipulations of a userspace cache. The design of this cache is questionable, and I certainly can't get it to perform well. Its user documentation is incoherent.
- After weeks of testing, I haven't been able to use it to upload large files.
It would be a huge win if btrbk could use S3 APIs directly. Dozens of cloud providers expose an S3 API now.
The S3 API is a large surface though. Minimal S3 support probably still requires multiple signature versions and autodetection of multipart uploads, and likely other stuff.
In the meantime, I suggest btrbk.conf
should offer a set of command endpoints, something like:
target pipe
pipe_target_list_files /usr/local/bin/list_files_from_s3.sh my_bucket
pipe_target_read_file /usr/local/bin/read_file_from_s3.sh my_bucket
pipe_target_write_file /usr/local/bin/write_file_to_s3.sh my_bucket
The expected interactions would then be just like target raw
, such that the scripts would be used to read and write *.info
files in the same patterns currently used.
Looking for a similar solution, just want to push an encrypted archive of a snapshot into a s3 long term storage such as https://www.ovhcloud.com/en-ca/public-cloud/cold-archive/. For 2$/month/TB its worth it ! I guess I can do it in another way, but directly integrated with btrbk is a must.
Hm.. instead of implementing the whole S3 API ourselves or jumping the gun with custom scripts, how about adding rclone support for uploading and managing files? It seems like it has all the necessary commands, e.g.:
-
rclone rcat
-- can be used to pipe directly into storage. -
rclone lsf
-- can be used to list current archives in storage. -
rclone cat
-- can be used to pipe directly out of storage.
The only downside is that rclone has its own config format.. that might make it messier than just allowing custom scripts.
Shameless plug: my simple solution to this problem, https://github.com/kubrickfr/btrfs-send-to-s3