zfsbackup-go icon indicating copy to clipboard operation
zfsbackup-go copied to clipboard

Support rclone as a backend

Open jtagcat opened this issue 3 years ago • 13 comments

It would be very nice to see rclone support. It supports many, many (sub)backends among other features.

For examples / guidance on implementation, see Restic's process. The best idea would be to express interest on rclone forums.

jtagcat avatar Aug 04 '20 00:08 jtagcat

Thanks for the suggestion! I actually looked into both restic and rclone when I started this project, however, neither gave the control I wanted in terms of ensuring the integrity of the uploaded objects.

I'd be happy to take a look again - do you have any idea on how you'd like to see this implemented? something like rclone+backend:// for the URI?

What specific storage targets are you hoping to see supported?

someone1 avatar Aug 04 '20 02:08 someone1

The URI is a bit complicated. rclone also has it's own URI (remoteAKABackend:subdirectory/inception).

I think the best way is to do it like restic:

  • rest:https://something/rest-server
  • local:/tank/restic
  • rclone:remoteAKABackend:subdirectory/inception

So in zb-g, it would be rclone://remoteAKABackend:subdirectory/inception\ escaping\ all\ the\ things\!

rclone also often needs other parameters, restic uses -o rclone.args for that. They can also do env vars.

I think the best point to start would be to ask if zb-g is using rclone mount (what I would advise against, there're many caveats). rclone commands such as copy and delete/prune should be used instead (more).

What specific storage targets are you hoping to see supported?

gdrive / gdrive behind rclone crypt, the classical ;)

I might also help on the actual code side, but 'might be done tomorrow or in 6 months' probably applies.

Also, cc @ncw

jtagcat avatar Aug 04 '20 02:08 jtagcat

I have just started to look into this project (i.e. zfsbackup-go) and was wondering about the exact same question/feature request. (Obviously, I am using ZFS as my main filesystem, but I have also been using restic and rclone otherwise - two excellent utilities).

@someone1: What are your requirements/wishes regarding control and ensuring the integrity of the uploaded objects?

As you indicate in https://github.com/someone1/zfsbackup-go/issues/1, verifying that the uploaded file is correct is a strict requirement. Many cloud storage platforms support however some sort of a hashing algorithm, and you can ask rclone to verify that the checksums of the local and remote files match (https://rclone.org/docs/#c-checksum). For an overview of which cloud storage provide supports hashing, you can consult the summary table here: https://rclone.org/overview/

Most (consumer-grad) cloud storage providers don't make any guarantees regarding durability, but I believe that is a separate issue each user probably has to make a decision on themselves.

Personally, I would be interested in a pCloud backend, just in case you are keeping tabs here.

awehrfritz avatar Aug 25 '20 16:08 awehrfritz

I would try and integrate rclone's backends within the code, not rely on it as an external dependency - I think it feasible and the project's licensing should allow it. Though maybe it'd be faster/easier to just integrate with the command-line utility...

You could use rclone today if you decide to use the file:// target - though it'd be somewhat hacky/tricky if you're limited on local disk space.

someone1 avatar Aug 26 '20 02:08 someone1

I would try and integrate rclone's backends within the code, not rely on it as an external dependency - I think it feasible and the project's licensing should allow it. Though maybe it'd be faster/easier to just integrate with the command-line utility...

OK, I see. I guess that could work. Would you then target only specific rclone-backends or all of them in an abstracted way?

You could use rclone today if you decide to use the file:// target - though it'd be somewhat hacky/tricky if you're limited on local disk space.

Well, yeah, one could just mount the remote storage using rclone and then use the file-target in zfsbackup. That would avoid the disk space issue, but zfsbackup would not know if the files are correctly transferred (or even worse would try to compute checksums on a file in a remote storage location). Not sure if that is a good idea, I am not yet that familiar with zfsbackup and haven't fully understood when and where checksums are computed (the documentation is rather scars on this topic).

awehrfritz avatar Aug 26 '20 03:08 awehrfritz

Embedding rclone is in todo for restic as well (deprecating/aliasing the existing backends that exist in restic, for less maintenance; restic is a 'no dependancy needed'). I'd still like to see @ncw in the conversation, on how we should go about this.

jtagcat avatar Aug 26 '20 12:08 jtagcat

OK, I see. I guess that could work. Would you then target only specific rclone-backends or all of them in an abstracted way?

All of them, probably abstracted. Otherwise, you'll get a feature request for each backend every month or so.

jtagcat avatar Aug 26 '20 12:08 jtagcat

@someone1 or @jtagcat : Is there an option currently implemented to verify that all the chunks in the remote storage location are sill correct?

This could be done by obtaining the checksums of the chunks in the remote storage and verify it against a list maintained on the local system. This would help to identify issues with the remote backups early on, especially for less durable storage.

awehrfritz avatar Aug 31 '20 03:08 awehrfritz

@awehrfritz - Not exactly, the assumption for all currently supported backends is that they are reliable/durable. It is beyond the scope of this application to validate remote storage.

What is done is checksums are sent to storage providers at time of upload, in addition to being kept in the manifest files. This way the remote storage provider can validate the objects they receive, and zfsbackup-go can validate the objects at download. We should know at time of upload and again validate at time of download that the objects still match the checksums computed on them.

If you do not trust the remote storage target, then you can try and bump #1 for parity archives, or select multiple targets to store your backups in.

someone1 avatar Aug 31 '20 15:08 someone1

Hi - rclone author here.

What is done is checksums are sent to storage providers at time of upload, in addition to being kept in the manifest files. This way the remote storage provider can validate the objects they receive, and zfsbackup-go can validate the objects at download. We should know at time of upload and again validate at time of download that the objects still match the checksums computed on them.

This is exactly how rclone works - checksums are validated on upload and download. Rclone doesn't store the checksums for local files, it recalculates them when it needs them.

Not all rclone backends support checksums - you can see which backend supports which checksum.

Could zfsbackup-go give different sorts of checksums to rclone? I see in the VolumeInfo struct there are multiple checksums available - rclone could use these quite easily.

I'm not 100% clear on exactly how zfsbackup-go works but I think I'm right in saying it streams data from ZFS, chunks it up into ~10MiB chunks and uploads them. This is very similar to restics workflow.

It would be straight foward to implement the backend interface I think. It looks to be an abstraction over an S3 like interface. This interface is fairly similar to that defined for the restic backend too...

type Backend interface {
	Init(ctx context.Context, conf *BackendConfig, opts ...Option) error  // Verifies settings required for backend are present and valid, does basic initialization of backend
	Upload(ctx context.Context, vol *files.VolumeInfo) error              // Upload the volume provided
	List(ctx context.Context, prefix string) ([]string, error)            // Lists all files in the backend, filtering by the provided prefix.
	Close() error                                                         // Release any resources in use
	PreDownload(ctx context.Context, objects []string) error              // PreDownload will prepare the provided files for download (think restoring from Glacier to S3)
	Download(ctx context.Context, filename string) (io.ReadCloser, error) // Download the requested file that can be read from the returned io.ReaderCloser
	Delete(ctx context.Context, filename string) error                    // Delete the file specified on the configured backend
}

Rclone's configuration is complicated by oauth which needs to be set up in advance. The config system is pluggable though but I'd like to make this easier for projects using rclone as a library.

ncw avatar Sep 01 '20 13:09 ncw

@someone1, any chance you could take a look at this again? I'd like to back my system up to Dropbox but needing to hack it in with file:// targets and rclone mount feels a bit silly when I'm trying to move away from my old hacky backup system in the first place.

AGSPhoenix avatar Aug 08 '22 23:08 AGSPhoenix

Also interested in this. Is there practically any issue with just having an rclone mount and using file://? I'm hoping to make this work with a google drive target.

digitalsignalperson avatar Aug 19 '22 21:08 digitalsignalperson

my attempt to use rclone with google drive

sudo ./zfsbackup-go_linux_amd64 send --full testpool/data file:///mnt/remote
2022/08/19 23:52:20 file backend: Error while verifying path /mnt/remote - stat /mnt/remote: permission denied
2022/08/19 23:52:20 Could not initialize backend due to error - stat /mnt/remote: permission denied.

where /mnt/remote is the rclone mount changing to a local path like /mnt/fakeremote executes fine

digitalsignalperson avatar Aug 19 '22 23:08 digitalsignalperson