litestream 2-Way Replication to S3

Hey, this is awesome!

As a potential enhancement, would it be possible to periodically watch the S3 bucket for updates and restore whenever a new version is published?

This would be excellent for low-load distributed systems. Multiple nodes would be able to write to their local SQLite database, this is pushed up to the S3 bucket and then litestream sees this and updates the local SQLite databases on all of the other nodes within that system.

Jan 23 '21 11:01 elliotforbes

Thanks! Monitoring the S3 bucket from the replica side is a good idea. It wouldn't work for multiple writer nodes though as you could have conflicting WAL writes at the same time and no real way to resolve the conflict. It'd have to be a single writer fanning out to multiple read replicas.

Jan 23 '21 14:01 benbjohnson

Thanks! Monitoring the S3 bucket from the replica side is a good idea. It wouldn't work for multiple writer nodes though as you could have conflicting WAL writes at the same time and no real way to resolve the conflict. It'd have to be a single writer fanning out to multiple read replicas.

Isn't this issue the same as #8 ? I was looking for a way to bring changes back to read replicas and found these two issues, but just want to confirm I am not misunderstanding what this one is about.

Mar 31 '21 20:03 lambrospetrou

@lambrospetrou I think that in this particular #15 issue, your replica would check for update in S3 (with polling). There are 3 components, writer -> S3 -> replica. There is a replication lag but your data is safely stored in-between.

In the #8 issue, your litestream replica would serve an HTTP endpoint and directly act as a remote wal receiver. There are only 2 components: writer -> replica. This architecture allow the replica to apply wal almost instantaneously.

But I think that the two methods are fully compatible, you can write to S3 to get the global data redundancy AND write to a "hot read replica" as both of them are replicas for litestream.

Apr 01 '21 06:04 yanc0

@lambrospetrou I think that in this particular #15 issue, your replica would check for update in S3 (with polling). There are 3 components, writer -> S3 -> replica. There is a replication lag but your data is safely stored in-between.

In the #8 issue, your litestream replica would serve an HTTP endpoint and directly act as a remote wal receiver. There are only 2 components: writer -> replica. This architecture allow the replica to apply wal almost instantaneously.

But I think that the two methods are fully compatible, you can write to S3 to get the global data redundancy AND write to a "hot read replica" as both of them are replicas for litestream.

Ah OK, in that case the S3 approach is what I was after as well :)

Apr 01 '21 08:04 lambrospetrou

I was thinking these issues are so similar, perhaps #8 really just needs to be a way to notify when a new WAL frame is pushed so any polling can be made more timely.

Jun 11 '21 02:06 andrewchambers

@andrewchambers I was thinking that I would just add a polling version of #8. Cloud storage tends to be expensive for downloads so it’s a lot more cost sensitive. I could add something like SQS to notify but that feels overly complicated.

Jun 11 '21 02:06 benbjohnson

@benbjohnson fwiw, my main use case is to have a website running on a single server, while i have multiple read only auth gateways which just follow whatever the website has configured.

Jun 11 '21 04:06 andrewchambers

This would obviously be a pretty nifty addition to Litestream. While the feature is built, here's what I am using as a simple Python wrapper that meets my needs (read replica for use with Grafana).

import boto3
import subprocess
import os
from datetime import datetime, timezone
from time import sleep

INTERVAL=60
DB_NAME="my_db.db"
TEMP_NAME="my_db_temp.db"
S3_KEY="path/to/my_db.db"
BUCKET_NAME="my_bucket"

s3 = boto3.client('s3', region_name='ap-south-1')

# 1-1-1970 00:00:00
last_sync = datetime.fromtimestamp(0).replace(tzinfo=timezone.utc)

start_after = ''

while True:
    # Small optimisation to ensure we sync only if S3 has an updated object
    objects = s3.list_objects_v2(Bucket=BUCKET_NAME, Prefix=S3_KEY, StartAfter=start_after)
    contents = objects.get('Contents')
    last_modified = last_sync
    if contents:
        for o in contents:
            modified = o['LastModified']
            if modified > last_modified:
                start_after = o['Key']
                last_modified = modified
        if last_modified > last_sync:
            last_sync = last_modified
            print("Sync started with last_modified:", last_sync)
            subprocess.run(["litestream", "restore", "-o", TEMP_NAME, "s3://{}/{}".format(BUCKET_NAME, S3_KEY)])
            # we need to mv the file as litestream will not restore over an existing DB.
            subprocess.call(["mv", TEMP_NAME, DB_NAME])
            print("Sync completed")
    sleep(INTERVAL)

Jun 22 '21 12:06 tejpochiraju

litestream litestream copied to clipboard

2-Way Replication to S3

litestream
litestream copied to clipboard