litefs icon indicating copy to clipboard operation
litefs copied to clipboard

Streaming S3 Backups

Open benbjohnson opened this issue 1 year ago • 11 comments

LiteFS provides some redundancy by running in a cluster, however, losing all nodes would cause all data to be lost. Replicating to S3 in manner similar to Litestream would provide high durability (11 9s) as well as allow point-in-time restores.

As opposed to Litestream, LiteFS is designed for efficient compaction of transaction files so restore time should be much faster.

benbjohnson avatar Jul 26 '22 14:07 benbjohnson

Any updates on the progress of this?

I currently use litestream for backing up one of my databases - could I just run litestream on /litefs/database.db (given a static lease) if I'd like to use litefs?

Koeng101 avatar Apr 04 '23 07:04 Koeng101

Any updates on the progress of this?

I don't have an ETA but streaming backups is the next thing I'm working on.

Could I just run litestream on /litefs/database.db (given a static lease) if I'd like to use litefs?

Yes, Litestream should work fine on the LiteFS primary as long as you're using a static lease.

benbjohnson avatar Apr 04 '23 15:04 benbjohnson

👀

jmordica avatar Apr 12 '23 03:04 jmordica

Rather than keeping it limited to S3 only, would it be possible to support other remotes too?

Preferably via something like rclone (https://rclone.org/) similar to how restic did it for backups (https://restic.net/blog/2018-04-01/rclone-backend/).

darthShadow avatar May 05 '23 06:05 darthShadow

@darthShadow I'm not opposed to supporting other endpoints. In my experience with Litestream, though, the vast majority of people used S3 or S3-compatible storage.

I get a bit nervous pulling in a dependency like rclone which does "everything" since it makes it difficult to support all the providers well. Someone will inevitably open an issue for something like Alibaba Cloud (which I've never used before) and I won't have any ability to support it without signing up for an account.

benbjohnson avatar May 05 '23 14:05 benbjohnson

It won't be a direct dependency like a library. The way restic does it is that it uses it as a separate binary on the system and all the communication is done over the stdio interface.

From the above article:

It turned out that it is also possible to run an HTTP2 connection over stdin/stdout of a newly started process. We’ve implemented this in restic and rclone. Internally, restic runs rclone serve restic --stdio, and it will serve HTTP requests via HTTP2 on stdin/stdout

support all the providers well

That's the beauty of doing it this way in that you only need to support the standard rclone interface and any provider-specific issues can be redirected to the rclone tracker to be resolved. I believe most (if not all) of the functionality is already tested automatically on every commit/release but you can also send any PRs for any missing tests so there's very little chance for any breakage unless the provider changes something.

In the worst case, rclone can be used to mount any provider as a local filesystem and then configure that as the local backup/restore point.

Sorry for the delay in the response, missed the comment notification.

darthShadow avatar May 15 '23 16:05 darthShadow

That's an interesting approach. Yeah, I'm not opposed to supporting restic if it's just an HTTP proxy or STDIN/STDOUT API.

benbjohnson avatar May 16 '23 00:05 benbjohnson

Is this still being pursued? In theory I could still use litestream in parallel to LiteFS by ensuring it only runs on the primary, right?

tionis avatar Sep 11 '23 00:09 tionis

Is this still being pursued?

A lot of the backup functionality has been implemented in LiteFS Cloud which we'll likely open source in the near future. We may also do a simplified version that's built into LiteFS which doesn't try to do multi-level compactions. That would trade restore performance for simplicity.

In theory I could still use litestream in parallel to LiteFS by ensuring it only runs on the primary, right?

Yes, you can still use Litestream on a LiteFS primary. It works best if you have a static primary, however, you could also separately backup multiple candidate nodes to different Litestream replication paths.

benbjohnson avatar Sep 11 '23 15:09 benbjohnson

Alright, thanks for the update!

tionis avatar Sep 11 '23 16:09 tionis

Any updates on the progress of this?

I don't have an ETA but streaming backups is the next thing I'm working on.

Could I just run litestream on /litefs/database.db (given a static lease) if I'd like to use litefs?

Yes, Litestream should work fine on the LiteFS primary as long as you're using a static lease.

Is there anyone who has managed to set this up successfully?

When I tried, it seems like liteFS is supposed to capture changes in the target database, but no changes occurred in the target database, so litestream did not replicate any changes.

I'm not sure if this is the correct approach.

    fuse:
      dir: "/data"
      allow-other: true
      debug: true

    data:
      dir: "/var/lib/litefs"

    http:
      addr: ":20202"

    lease:
      type: "static"
      advertise-url: "http://0.0.0.0:20202"
      candidate: true


    dbs:
      - path: /data/data.db
        replicas:
          - name: backup
            url: abs://datafolder@data/data

mrchypark avatar Feb 23 '24 05:02 mrchypark