Sync to custom S3 apis
Similar to the Azure sync that would be implemented in https://github.com/GothenburgBitFactory/taskchampion/issues/369, the GCP sync implemented in https://github.com/GothenburgBitFactory/taskwarrior/issues/3185, and the AWS sync implemented in https://github.com/GothenburgBitFactory/taskchampion/issues/368, we should be able to sync replicas to a S3-compatible object storages like MinIO, seaweedfs, Garage, Versity's S3 Gateway, or even more-enterprise solutions like Tigris.
I'm not sure how easy it'd be to allow the use of a custom endpoint, nor if it's already implemented by using the aws cli client option, but I tried setting the AWS_ENDPOINT_URL environment-variable to https://play.min.io:9000, having the aws client configured properly, and it didn't work IIRC.
This is a good idea! If you can figure out how to do it with the normal AWS shell command (profile config??), it should be pretty easy to replicate that in the TW config options.
This is a good idea! If you can figure out how to do it with the normal AWS shell command (profile config??), it should be pretty easy to replicate that in the TW config options.
The docs say to use aws --endpoint-url http://server.url.here.tld.invalid s3 ls
Suppposedly, there's also a way to do it by setting environment variables, or editing the config (manually?): https://aws.amazon.com/blogs/developer/new-improved-flexibility-when-configuring-endpoint-urls-with-the-aws-sdks-and-tools/
It looks like this can be included in a profile: https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-files.html#cli-config-endpoint_url
If that works, then that might be the best option -- we can document how to set up the profile, but avoid adding purpose-specific config to TaskChampion and Taskwarrior.
If that doesn't work, could you check if a patch like this works?
diff --git src/server/cloud/aws.rs src/server/cloud/aws.rs
index 264b6824b..3c64460c3 100644
--- src/server/cloud/aws.rs
+++ src/server/cloud/aws.rs
@@ -67,16 +67,17 @@ impl AwsService {
bucket: String,
creds: AwsCredentials,
) -> Result<Self> {
let rt = Runtime::new()?;
let config =
rt.block_on(async {
let mut config_provider = aws_config::defaults(BehaviorVersion::v2024_03_28());
+ config_provider = config_provider.endpoint_url("https://foo.com");
match creds {
AwsCredentials::AccessKey {
access_key_id,
secret_access_key,
} => {
config_provider = config_provider.credentials_provider(
Credentials::from_keys(access_key_id, secret_access_key, None),
It looks like this can be included in a profile: https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-files.html#cli-config-endpoint_url
If that works, then that might be the best option -- we can document how to set up the profile, but avoid adding purpose-specific config to TaskChampion and Taskwarrior.
It doesn't seem to work. As I get a "InvalidAccessKeyId" message, and I don't see any requests in my server's log.
If that doesn't work, could you check if a patch like this works?
I checked, and with the patch it behaves as if I were to have set the AWS_ENDPOINT_URL, which doesn't seem to work because of dns, but dns works because the official client seems to work?
$ ./task sync
unhandled error: dispatch failure: io error: error trying to connect: dns error: failed to lookup address information: No address associated with hostname: dns error: failed to lookup address information: No address associated with hostname: failed to lookup address information: No address associated with hostname
$ AWS_ENDPOINT_URL=https://s3.tilde.green task sync
unhandled error: dispatch failure: io error: error trying to connect: dns error: failed to lookup address information: No address associated with hostname: dns error: failed to lookup address information: No address associated with hostname: failed to lookup address information: No address associated with hostname
I've noticed that aws's endpoint-url in the cli behaves differently than when used with the aws crate like taskwarrior is doing, because somehow I'm getting a dns query for bucket.s3.tilde.green, when I have the endpoint as s3.tilde.green.
Ok, there's two ways to handle a endpoint, one is path style and the other is subdomain style, and by default minio does path style, whilst things like the aws crate do subdomain style, but the aws cli tool seemed to do path style, so maybe there needs to be another knob for that.
See: https://github.com/awslabs/aws-sdk-rust/issues/390
So, after configuring my minio instance, and setting it up to have subdomain style, it works, it seems to have made 5 objects, taking up 2MiB.
OK, it sounds like the earlier tests were with an incorrect configuration of minio, then? Could you verify whether the profile approach works?
OK, it sounds like the earlier tests were with an incorrect configuration of minio, then? Could you verify whether the profile approach works?
Sorry, I wasn't clear enough. The profile approach doesn't work, it only works when I either recompile taskwarrior with the modified taskchampion applying your patch, OR I set the AWS_ENDPOINT_URL environment variable.
In both of the second cases where it works, it uses subdomain style, not subpath style but I guess implementing support for subpath style is not needed, but if wanted look at https://github.com/awslabs/aws-sdk-rust/issues/390#issuecomment-1405378958, in concrete https://github.com/awslabs/aws-sdk-rust/blob/main/sdk/s3/tests/endpoints.rs#L36-L43 is the test)
OK, sounds like we'll need to add an endpoint URL config option, as requiring that an env variable be set when running task sync (much less requiring it when using taskchampion algorithmically) is pretty bad UX. I'm not sure what will be the best way to accomplish that in a non-breaking fashion. Perhaps adding a new AwsEndpoint variant to ServerConfig?
OK, sounds like we'll need to add an endpoint URL config option, as requiring that an env variable be set when running
task sync(much less requiring it when using taskchampion algorithmically) is pretty bad UX. I'm not sure what will be the best way to accomplish that in a non-breaking fashion. Perhaps adding a newAwsEndpointvariant to ServerConfig?
Yes, I guess? I'm not too well versed into this, so I'm not sure, but I guess your idea LGTM
Oh, one thing to check -- does minio support the relatively recent conditional writes? If not, it could lead to replicas getting out of sync when sync operations overlap in time. It seems so, but discussion there suggests it may be different. This is concerning since the de-synchronization would be rare. We should probably actually test each of these S3 emulators.
How popular are the various S3 replacements you mentioned in the first comment? And how difficult would it be to set them up for testing in a GitHub action?
Oh, one thing to check -- does minio support the relatively recent conditional writes? If not, it could lead to replicas getting out of sync when sync operations overlap in time. It seems so, but discussion there suggests it may be different. This is concerning since the de-synchronization would be rare. We should probably actually test each of these S3 emulators.
Not sure.
How popular are the various S3 replacements you mentioned in the first comment? And how difficult would it be to set them up for testing in a GitHub action?
I'm not sure. I just wanted to note that this could be something to do. I know some of them would require using accounts and connecting to the internet, so maybe not the best of ideas. (Tigris at least as it seems to not allow self-hosting (not that you'd want to possibly?))
This would be a good thing to get started with -- getting a test environment set up would be the starting point, then ensuring that tests run locally and demonstrate that the conditional writes actually work. Then, making the code changes, and figuring out how to test that in CI.
Hello, sorry to barge in but it would be useful to also make the "path-style" vs "subdomain" also configurable as part of this feature.
The reasoning being that self-hosting the "subdomain" setup tends to be a bit more complicated (requires a reverse proxy, wildcard TLS certificates...) and doesn't have the benefits it has for AWS (as self-hosted tends to be a single instance and in any case doesn't span across regions). It also makes the testing setup a bit easier, but I'm not sure that's relevant as you would want to test both variants in any case.
Good point - let's at least leave room for that in the configuration for the moment.
OK, sounds like we'll need to add an endpoint URL config option, as requiring that an env variable be set when running
task sync(much less requiring it when using taskchampion algorithmically) is pretty bad UX. I'm not sure what will be the best way to accomplish that in a non-breaking fashion. Perhaps adding a newAwsEndpointvariant to ServerConfig?
Support for S3 compatible providers in profile config has been a longstanding issue (requested in 2015, implemented in 2023) with a lot of discussion around it (https://github.com/aws/aws-cli/issues/1270).
In my experience, applications that are not aws-cli don't make use of profiles at all, maybe because the support for custom providers is recent, maybe because it's still cumbersome and requires to write a ini file that I struggle to recognize as such, for instance
[profile local]
region=us-west-2
services=local-services
[services local-services]
dynamodb =
endpoint_url=http://localhost:8080/
sfn =
endpoint_url=http://localhost:8083/
As @D34DPlayer pointed out, many self-hosted minio instances don't bother to configure it to make the dns-style bucket addressing work, it turns out same as above, applications that allow the use of s3 alternatives have a way to specify "path-style" bucket addressing in the config.
There is no consensus on how to specify this parameter, boto3 for instance is discussing about allowing the user to configure it via environment variable, but the issue is open https://github.com/boto/botocore/issues/3307
As far as taskchampion is concerned, I would add endpoint_url and path_style parameters in the Aws section of the ServerConfig, with two defaults: https://s3.<region>.amazonaws.com/ and false respectively so that it works out of the box with standard AWS.
How popular are the various S3 replacements you mentioned in the first comment?
In general they're popular enough that many providers are implementing them, Joplin has a non exhaustive list here: https://joplinapp.org/help/apps/sync/s3/ , Hetzner for instance is one that is not on the list
Great, thank you! It sounds like there's a good path forward here. We can build a similar list of tested-and-supported providers, as we test each one.
Found this issue while attempting to use ceph as an objectstore. Happy to test that provider once we have a patch!
(Ceph can be configured to suppport the VirtualHost style paths, but I'm only setup for path_style if that's something we want to document separately).
For #599 we are thinking of a 3.0, which would permit breaking changes such as adding optional endpoint_url and path_style fields property to the AwsEndpoint variant.
Would you like to try making a patch?
Not PR quality obviously as hardcoding :)
But just testing if the approach works:
diff --git a/src/server/cloud/aws.rs b/src/server/cloud/aws.rs
index 264b6824b..77e260f87 100644
--- a/src/server/cloud/aws.rs
+++ b/src/server/cloud/aws.rs
@@ -13,7 +13,7 @@ use aws_sdk_s3::{
use std::future::Future;
use tokio::runtime::Runtime;
-/// A [`Service`] implementation based on the Google Cloud Storage service.
+/// A [`Service`] implementation based on the AWS Simple Storage Service.
pub(in crate::server) struct AwsService {
client: s3::Client,
rt: Runtime,
@@ -98,7 +98,10 @@ impl AwsService {
.await
});
- let client = s3::client::Client::new(&config);
+ let s3_config = aws_sdk_s3::config::Builder::from(&config)
+ .force_path_style(true)
+ .build();
+ let client = aws_sdk_s3::Client::from_conf(s3_config);
Ok(Self { client, rt, bucket })
}
That + AWS_ENDPOINT_URL="https://rgw.apps.k8s.linux-fu.ninja" (just the environment variable)
- .taskrc
sync.aws.bucket=task-sync-f174f643-b75e-4efb-8fc4-bd138c44d7c8
sync.aws.access_key_id=XVEAP4W7QBIB8REXIVXL
sync.aws.secret_access_key=hunter2
sync.aws.region=fake
Resulted in a successful (push) sync!
I'm struggling to get it to sync in the other direction, but I think things get weird when you swap between multiple sync sources with a singular sqlite db.
I imagine we'll want to make sync.aws.region optional if endpoint_url is specified.
Actually.. if we're using default_credentials or even profile (anything but access_key_id), I think there should be some interpreted region. Do we want it to be optional in all cases but access_key?
I have some very rough draft PRs -
- https://github.com/GothenburgBitFactory/taskchampion/pull/601/files
- https://github.com/GothenburgBitFactory/taskwarrior/pull/3924
Clearly work is needed. TODO:
- as CI is lighting up like a Christmas tree, fix all of that.
- Needs actual PR descriptions
- Needs doc updates in task-sync(5)
- Needs discussion on the fake/hardcoded region idea (and needs an example of the error you get when no region is set)
@travisby work on #600 just bumped the major version, meaning now is a good time to get those breaking changes in! Will you have time to wrap this up?
I've been able to make the latest changes and it feels like it's in a good place.. but!
Running our integration tests against:
- AWS
- Ceph
- Minio
We have success in AWS, but our tests fail on Ceph and Minio in two separate places, due to upstream bugs with those services.
I'm not sure if there's another implementation that things would work for that would make it worth having this feature for, before those two fixes (or if we want to implement workarounds).
There's some fun nuance for anyone else following along on this issue, to check out in https://github.com/GothenburgBitFactory/taskchampion/pull/601
Sorry for letting this sit in my inbox for a while!
That's a bummer, but good to know that there's hope on the horizon. What if we merge the PR now to update the ServerConfig, so that's in place for 3.0.0, but document that at this time no non-AWS services are known to work, with a link to this issue for details.
Bonus points would be adding a bit to the documentation (or in a comment here, or somewhere people can find it) as to how to run the tests against a particular service. Then it's simple for someone else to either update Ceph or Minio, or try against another service, and find that all the tests pass -- at which point we can list that service / version as supported.
@travisby I think this is ready to merge, or just a few docs tweaks away from ready to merge, yes?