litestream
litestream copied to clipboard
Custom S3-Compatible Storage Support other than localhost
Hello. I was trying to use litestream with custom minio storage, but was not working- so I checked why, and figured it out there was some regex match check on parsing s3 host.
https://github.com/benbjohnson/litestream/blob/main/s3/replica_client.go#L684-L701
if a := localhostRegex.FindStringSubmatch(host); a != nil {
bucket, region = a[1], "us-east-1"
scheme, endpoint = "http", "localhost"
} else if a := backblazeRegex.FindStringSubmatch(host); a != nil {
bucket, region = a[1], a[2]
endpoint = fmt.Sprintf("s3.%s.backblazeb2.com", region)
} else if a := filebaseRegex.FindStringSubmatch(host); a != nil {
bucket, endpoint = a[1], "s3.filebase.com"
} else if a := digitalOceanRegex.FindStringSubmatch(host); a != nil {
bucket, region = a[1], a[2]
endpoint = fmt.Sprintf("%s.digitaloceanspaces.com", region)
} else if a := linodeRegex.FindStringSubmatch(host); a != nil {
bucket, region = a[1], a[2]
endpoint = fmt.Sprintf("%s.linodeobjects.com", region)
} else {
bucket = host
forcePathStyle = false
}
As for I know, forcePathStyle
should be true in s3-compatible storage like minio, but it seems if the url does not contain localhost, it cannot be used.
For the REPLICA_URL
I used this value: s3://backup.s3.mysite.com/litestream-testing/file.db
.
@kesuskim Unfortunately, there's not a way to distinguish between an AWS S3 bucket and a MinIO bucket just from the hostname. Perhaps we could use a minio://
prefix to hack around that (even though it still uses the same underlying s3
client).
I see. Maybe minio://
scheme hack would be one wayπ
accidentally closed issue, so reopened it until resolved
There might be ceph or other object storage, so just using minio://
as scheme can be confusing. Or maybe we should make mention of that clear in documentation.
Hi
Hopefully not wrong, but isn't endpoint-url part of the AWS cli/sdks by design? https://docs.aws.amazon.com/cli/latest/reference/#options
aws --endpoint-url=http://<ip|dns> s3api list-buckets
aws --endpoint-url=http://<ip|dns> s3 cp s3://bucket/file.yaml file.yaml
I think it is this on the go sdk? https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/configuring-sdk.html#custom-endpoint
I think it is configurable on instantiation? https://github.com/benbjohnson/litestream/blob/main/s3/replica_client.go#L101 https://github.com/benbjohnson/litestream/blob/main/s3/replica_client.go#L139
@juliostanley Yes, the endpoint is configurable in Litestream using the configuration file. However, Litestream supports a more compact URL format (e.g. s3://mybucket/db
) that is nice when you just want to restore a database on a machine without a config file set up or you are doing some testing.
The problem is that there's no way to distinguish s3://mybkt.litestream.io/db
as being a MinIO server rather than an AWS S3 bucket named mybkt.litestream.io
. Adding support for a special minio://
scheme would distinguish it as minio://mybkt.litestream.io/db
.
Maybe a more generic s3+endpoint://
scheme would be better (although more verbose)?
I think s3+endpoint://
scheme would be more clear π
@benbjohnson My first time using litestream, and I missed that in the docs π
it definitely works, and I didn't take into account the multi url for the replicate
(assuming multiple protocols for destinations). Thank you for explaining.
Yes s3+endpoint://
would make sense to me, or endpoint+s3://
Another good option might be s3+compatible://
. I feel like βendpointβ is more of an implementation detail and not as clear. π€·ββοΈ
yep, s3+compatible
could be more agreeable term, more familar to hear π
The problem is that there's no way to distinguish s3://mybkt.litestream.io/db as being a MinIO server rather than an AWS S3 bucket named mybkt.litestream.io.
The gocloud.dev/blob library understands URLs like this:
s3://bucketname?endpoint=host:port&disableSSL=true&s3ForcePathStyle=true®ion=wtf
https://gocloud.dev/concepts/urls/
An alternate trick is to remember that the double-slash //
in URLs means host (no matter how much people abuse that).
Make the bucket be first segment of path, set default hostname to AWS S3, and use the hostname to override it.
s3://ambiguous/db -> host="ambiguous" path="/db" opaque=""
s3:/path/db -> host="" path="/path/db" opaque=""
s3:host/db -> host="" path="" opaque="host/db"
https://play.golang.org/p/yV9W4PHH8w_u
@tv42 That's a good point. Allowing query params could be a better solution. I didn't realize you could write URLs with a single-slash and no-slash. That's really good to know (although probably super confusing for users π)
This issue still persists as of 15/12/2022 This makes us unable to run litestream commands with a min.io instance that is on the cloud Does anyone have a work around?
@joaofrf You should be able to run against a minio instance by specifying it in the Litestream config file instead of using the URL format.
@benbjohnson my problem is that I need to be replicating and restoring Is it possible to "listen on my min.io" for changes while sending my changes to my bucket? What I need is:
- while listening for changes, update my sqlite.db locally and if my sqlite.db is changed, update the bucket
Is this achievable using the config file?
Right now I managed to get this "working" partially by replicating and restoring constantly and I'm able to run commands because I'm using localhost
@joaofrf Replicating to another live SQLite database is no longer a part of the goal of Litestream. Distributed SQLite has been moved to another project called LiteFS.
As for restoring from a MinIO instance in your config, you can specify the database path in the config to restore from its replica: https://litestream.io/reference/restore/#with-a-database-path
Hey there @benbjohnson , thank you for your answer. I've looked into LiteFS, however it seems to be a service rather than a solution. What I mean by this is that they sell their service as a server etc... and I need to implement that on my side because of privacy and availability concerns. (my final product might not have access to the internet) Is there another solution for this?
Also If you know anything about this I would deeply thank you:
` litestream restore -v -o db.sqlite s3://mybkt.localhost:9000/db.sqlite
2022/12/20 10:41:06.317258 s3: restoring snapshot 8b2784dc281d5c22/00000000 to db.sqlite.tmp 2022/12/20 10:41:06.342383 s3: restoring wal files: generation=8b2784dc281d5c22 index=[00000000,00000000]
cannot download wal 8b2784dc281d5c22/00000000: RequestError: send request failed caused by: Get "http://localhost:9000/mybkt/db.sqlite/generations/8b2784dc281d5c22/wal/00000000_0304b068.wal.lz4": dial tcp: lookup localhost: too many open files `
After a few restores I'm getting the "too many open files" error, even after restarting the server the error persists.
My scenario is:
machine 1: litestream + local min io (replicating)
machine2: litestream + local min io
min io are sync using their site replication (seems to be working just fine)
Any tips would be appreciated. My ultimate goal is Multi-master replication
I've looked into LiteFS, however it seems to be a service rather than a solution.
It's an open source project that you can run anywhere. Its development is sponsored by Fly.io but it's not required to run on their hosting.
After a few restores I'm getting the "too many open files" error, even after restarting the server the error persists.
I'm not sure why you'd get too many open files
after running the restore multiple times. After the command ends, the file descriptors are cleaned up by the OS. You can adjust your open file descriptor limits using the ulimit
command on Linux.
My ultimate goal is Multi-master replication
Litestream & LiteFS are both physical replication tools which means they copy out changes byte-for-byte. There's not really a way to do multi-master with physical replication. You may want to look into something like Mycelial to allow for writes on different nodes and having them resolve the writes later.