litestream
litestream copied to clipboard
2-Way Replication to S3
Hey, this is awesome!
As a potential enhancement, would it be possible to periodically watch the S3 bucket for updates and restore whenever a new version is published?
This would be excellent for low-load distributed systems. Multiple nodes would be able to write to their local SQLite database, this is pushed up to the S3 bucket and then litestream sees this and updates the local SQLite databases on all of the other nodes within that system.
Thanks! Monitoring the S3 bucket from the replica side is a good idea. It wouldn't work for multiple writer nodes though as you could have conflicting WAL writes at the same time and no real way to resolve the conflict. It'd have to be a single writer fanning out to multiple read replicas.
Thanks! Monitoring the S3 bucket from the replica side is a good idea. It wouldn't work for multiple writer nodes though as you could have conflicting WAL writes at the same time and no real way to resolve the conflict. It'd have to be a single writer fanning out to multiple read replicas.
Isn't this issue the same as #8 ? I was looking for a way to bring changes back to read replicas and found these two issues, but just want to confirm I am not misunderstanding what this one is about.
@lambrospetrou I think that in this particular #15 issue, your replica would check for update in S3 (with polling). There are 3 components, writer -> S3 -> replica. There is a replication lag but your data is safely stored in-between.
In the #8 issue, your litestream replica would serve an HTTP endpoint and directly act as a remote wal receiver. There are only 2 components: writer -> replica. This architecture allow the replica to apply wal almost instantaneously.
But I think that the two methods are fully compatible, you can write to S3 to get the global data redundancy AND write to a "hot read replica" as both of them are replicas for litestream.
@lambrospetrou I think that in this particular #15 issue, your replica would check for update in S3 (with polling). There are 3 components, writer -> S3 -> replica. There is a replication lag but your data is safely stored in-between.
In the #8 issue, your litestream replica would serve an HTTP endpoint and directly act as a remote wal receiver. There are only 2 components: writer -> replica. This architecture allow the replica to apply wal almost instantaneously.
But I think that the two methods are fully compatible, you can write to S3 to get the global data redundancy AND write to a "hot read replica" as both of them are replicas for litestream.
Ah OK, in that case the S3 approach is what I was after as well :)
I was thinking these issues are so similar, perhaps #8 really just needs to be a way to notify when a new WAL frame is pushed so any polling can be made more timely.
@andrewchambers I was thinking that I would just add a polling version of #8. Cloud storage tends to be expensive for downloads so it’s a lot more cost sensitive. I could add something like SQS to notify but that feels overly complicated.
@benbjohnson fwiw, my main use case is to have a website running on a single server, while i have multiple read only auth gateways which just follow whatever the website has configured.
This would obviously be a pretty nifty addition to Litestream. While the feature is built, here's what I am using as a simple Python wrapper that meets my needs (read replica for use with Grafana).
import boto3
import subprocess
import os
from datetime import datetime, timezone
from time import sleep
INTERVAL=60
DB_NAME="my_db.db"
TEMP_NAME="my_db_temp.db"
S3_KEY="path/to/my_db.db"
BUCKET_NAME="my_bucket"
s3 = boto3.client('s3', region_name='ap-south-1')
# 1-1-1970 00:00:00
last_sync = datetime.fromtimestamp(0).replace(tzinfo=timezone.utc)
start_after = ''
while True:
# Small optimisation to ensure we sync only if S3 has an updated object
objects = s3.list_objects_v2(Bucket=BUCKET_NAME, Prefix=S3_KEY, StartAfter=start_after)
contents = objects.get('Contents')
last_modified = last_sync
if contents:
for o in contents:
modified = o['LastModified']
if modified > last_modified:
start_after = o['Key']
last_modified = modified
if last_modified > last_sync:
last_sync = last_modified
print("Sync started with last_modified:", last_sync)
subprocess.run(["litestream", "restore", "-o", TEMP_NAME, "s3://{}/{}".format(BUCKET_NAME, S3_KEY)])
# we need to mv the file as litestream will not restore over an existing DB.
subprocess.call(["mv", TEMP_NAME, DB_NAME])
print("Sync completed")
sleep(INTERVAL)