litestream Custom S3-Compatible Storage Support other than localhost

Hello. I was trying to use litestream with custom minio storage, but was not working- so I checked why, and figured it out there was some regex match check on parsing s3 host.

https://github.com/benbjohnson/litestream/blob/main/s3/replica_client.go#L684-L701

	if a := localhostRegex.FindStringSubmatch(host); a != nil {
		bucket, region = a[1], "us-east-1"
		scheme, endpoint = "http", "localhost"
	} else if a := backblazeRegex.FindStringSubmatch(host); a != nil {
		bucket, region = a[1], a[2]
		endpoint = fmt.Sprintf("s3.%s.backblazeb2.com", region)
	} else if a := filebaseRegex.FindStringSubmatch(host); a != nil {
		bucket, endpoint = a[1], "s3.filebase.com"
	} else if a := digitalOceanRegex.FindStringSubmatch(host); a != nil {
		bucket, region = a[1], a[2]
		endpoint = fmt.Sprintf("%s.digitaloceanspaces.com", region)
	} else if a := linodeRegex.FindStringSubmatch(host); a != nil {
		bucket, region = a[1], a[2]
		endpoint = fmt.Sprintf("%s.linodeobjects.com", region)
	} else {
		bucket = host
		forcePathStyle = false
	}

As for I know, forcePathStyle should be true in s3-compatible storage like minio, but it seems if the url does not contain localhost, it cannot be used.

For the REPLICA_URL I used this value: s3://backup.s3.mysite.com/litestream-testing/file.db.

Jun 17 '21 06:06 kesuskim

@kesuskim Unfortunately, there's not a way to distinguish between an AWS S3 bucket and a MinIO bucket just from the hostname. Perhaps we could use a minio:// prefix to hack around that (even though it still uses the same underlying s3 client).

Jun 17 '21 14:06 benbjohnson

I see. Maybe minio:// scheme hack would be one way😃

Jun 22 '21 02:06 kesuskim

accidentally closed issue, so reopened it until resolved

Jun 22 '21 02:06 kesuskim

There might be ceph or other object storage, so just using minio:// as scheme can be confusing. Or maybe we should make mention of that clear in documentation.

Jun 25 '21 01:06 kesuskim

Hi

Hopefully not wrong, but isn't endpoint-url part of the AWS cli/sdks by design? https://docs.aws.amazon.com/cli/latest/reference/#options

aws --endpoint-url=http://<ip|dns> s3api list-buckets
aws --endpoint-url=http://<ip|dns> s3 cp s3://bucket/file.yaml file.yaml

I think it is this on the go sdk? https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/configuring-sdk.html#custom-endpoint

I think it is configurable on instantiation? https://github.com/benbjohnson/litestream/blob/main/s3/replica_client.go#L101 https://github.com/benbjohnson/litestream/blob/main/s3/replica_client.go#L139

Jun 25 '21 04:06 juliostanley

@juliostanley Yes, the endpoint is configurable in Litestream using the configuration file. However, Litestream supports a more compact URL format (e.g. s3://mybucket/db) that is nice when you just want to restore a database on a machine without a config file set up or you are doing some testing.

The problem is that there's no way to distinguish s3://mybkt.litestream.io/db as being a MinIO server rather than an AWS S3 bucket named mybkt.litestream.io. Adding support for a special minio:// scheme would distinguish it as minio://mybkt.litestream.io/db.

Maybe a more generic s3+endpoint:// scheme would be better (although more verbose)?

Jun 25 '21 21:06 benbjohnson

I think s3+endpoint:// scheme would be more clear 👍

Jun 26 '21 02:06 kesuskim

@benbjohnson My first time using litestream, and I missed that in the docs 😅 it definitely works, and I didn't take into account the multi url for the replicate (assuming multiple protocols for destinations). Thank you for explaining.

Yes s3+endpoint:// would make sense to me, or endpoint+s3://

Jun 27 '21 05:06 juliostanley

Another good option might be s3+compatible://. I feel like “endpoint” is more of an implementation detail and not as clear. 🤷‍♂️

Jun 27 '21 13:06 benbjohnson

yep, s3+compatible could be more agreeable term, more familar to hear 😃

Jun 27 '21 14:06 kesuskim

The problem is that there's no way to distinguish s3://mybkt.litestream.io/db as being a MinIO server rather than an AWS S3 bucket named mybkt.litestream.io.

The gocloud.dev/blob library understands URLs like this:

s3://bucketname?endpoint=host:port&disableSSL=true&s3ForcePathStyle=true&region=wtf

https://gocloud.dev/concepts/urls/

An alternate trick is to remember that the double-slash // in URLs means host (no matter how much people abuse that). Make the bucket be first segment of path, set default hostname to AWS S3, and use the hostname to override it.

s3://ambiguous/db -> host="ambiguous" path="/db" opaque=""
s3:/path/db -> host="" path="/path/db" opaque=""
s3:host/db -> host="" path="" opaque="host/db"

https://play.golang.org/p/yV9W4PHH8w_u

Sep 13 '21 00:09 tv42

@tv42 That's a good point. Allowing query params could be a better solution. I didn't realize you could write URLs with a single-slash and no-slash. That's really good to know (although probably super confusing for users 😂)

Sep 26 '21 15:09 benbjohnson

This issue still persists as of 15/12/2022 This makes us unable to run litestream commands with a min.io instance that is on the cloud Does anyone have a work around?

Dec 15 '22 14:12 joaofrf

@joaofrf You should be able to run against a minio instance by specifying it in the Litestream config file instead of using the URL format.

Dec 19 '22 00:12 benbjohnson

@benbjohnson my problem is that I need to be replicating and restoring Is it possible to "listen on my min.io" for changes while sending my changes to my bucket? What I need is:

while listening for changes, update my sqlite.db locally and if my sqlite.db is changed, update the bucket

Is this achievable using the config file?

Right now I managed to get this "working" partially by replicating and restoring constantly and I'm able to run commands because I'm using localhost

Dec 19 '22 12:12 joaofrf

@joaofrf Replicating to another live SQLite database is no longer a part of the goal of Litestream. Distributed SQLite has been moved to another project called LiteFS.

As for restoring from a MinIO instance in your config, you can specify the database path in the config to restore from its replica: https://litestream.io/reference/restore/#with-a-database-path

Dec 19 '22 15:12 benbjohnson

Hey there @benbjohnson , thank you for your answer. I've looked into LiteFS, however it seems to be a service rather than a solution. What I mean by this is that they sell their service as a server etc... and I need to implement that on my side because of privacy and availability concerns. (my final product might not have access to the internet) Is there another solution for this?

Also If you know anything about this I would deeply thank you:

` litestream restore -v -o db.sqlite s3://mybkt.localhost:9000/db.sqlite

2022/12/20 10:41:06.317258 s3: restoring snapshot 8b2784dc281d5c22/00000000 to db.sqlite.tmp 2022/12/20 10:41:06.342383 s3: restoring wal files: generation=8b2784dc281d5c22 index=[00000000,00000000]

cannot download wal 8b2784dc281d5c22/00000000: RequestError: send request failed caused by: Get "http://localhost:9000/mybkt/db.sqlite/generations/8b2784dc281d5c22/wal/00000000_0304b068.wal.lz4": dial tcp: lookup localhost: too many open files `

After a few restores I'm getting the "too many open files" error, even after restarting the server the error persists.

My scenario is:

machine 1: litestream + local min io (replicating)

machine2: litestream + local min io

min io are sync using their site replication (seems to be working just fine)

Any tips would be appreciated. My ultimate goal is Multi-master replication

Dec 20 '22 10:12 joaofrf

I've looked into LiteFS, however it seems to be a service rather than a solution.

It's an open source project that you can run anywhere. Its development is sponsored by Fly.io but it's not required to run on their hosting.

After a few restores I'm getting the "too many open files" error, even after restarting the server the error persists.

I'm not sure why you'd get too many open files after running the restore multiple times. After the command ends, the file descriptors are cleaned up by the OS. You can adjust your open file descriptor limits using the ulimit command on Linux.

My ultimate goal is Multi-master replication

Litestream & LiteFS are both physical replication tools which means they copy out changes byte-for-byte. There's not really a way to do multi-master with physical replication. You may want to look into something like Mycelial to allow for writes on different nodes and having them resolve the writes later.

Dec 27 '22 16:12 benbjohnson

litestream litestream copied to clipboard

Custom S3-Compatible Storage Support other than localhost

litestream
litestream copied to clipboard