aws.s3 icon indicating copy to clipboard operation
aws.s3 copied to clipboard

Compatibility with alternative endpoints (e.g. MINIO) via AWS_S3_ENDPOINT

Open cboettig opened this issue 5 years ago • 6 comments

MINIO is a popular and powerful open source implementation of the AWS S3 buckets. Most major AWS clients, (e.g. aws-cli, boto etc) are thus compatible with MINIO out of the box.

aws.s3 can already work with MINIO too, but certain things are a bit weird due to assumptions that appear to be enforced by aws.s3 that I haven't figured out how to work around. For instance, I see that aws.s3 already supports the notion of alternative endpoints with the env var, AWS_S3_ENDPOINT, but I cannot get it not to insist on appending AWS_DEFAULT_REGION to the endpoint. For example, I have a MINIO instance at https://data.ecoforecast.org, I can trick aws.s3 into listing my buckets like this:

Sys.setenv(
           "AWS_DEFAULT_REGION" = "data",
           "AWS_S3_ENDPOINT" = "ecoforecast.org")

## test that all is good -- public bucket so example doesn't need authentication
aws.s3::get_bucket("forecasts")

and all is well. But it would be more intuitive to do something like:

Sys.setenv(
           "AWS_DEFAULT_REGION" = "",
           "AWS_S3_ENDPOINT" = "data.ecoforecast.org")

I get the more surprising error:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Could not resolve host: us-east-1.data.ecoforecast.org

Is there a way to work around this? Thanks!

cboettig avatar Sep 07 '20 00:09 cboettig

Hi! I would highly appreciate Carls suggestion. It used to work with environmental variables before (somewhat 10 months ago), but now it seems to default to us-east-1 in case AWS_DEFAULT_REGION is empty. Thanks for a reply!

jenzopr avatar Oct 08 '20 10:10 jenzopr

I agree too. It would be nice to be able to configure alternative endpoints for example against a minio backend through environment variables where also the protocol could be picked up through an environment variable which gets passed through to "use_https" which is currently by default TRUE?

There is already "AWS_DEFAULT_REGION" and since the protocol cannot be set through the AWS_S3_ENDPOINT (using aws.s3_0.3.21.tar.gz and something like AWS_S3_ENDPOINT=http://myminioserver.somewhere.org:9000), should there also be "AWS_DEFAULT_PROTOCOL" which aws.s3 could use to pass "use_https" rather than defaulting to TRUE?

Alternatively would it be possible to add support for a connectionstring environment variable AWS_S3_URI which includes also the protocol and maybe even credentials? It can be quite convenient when testing locally against for example a minio service.

See for example configuring a "connectionstring" against S3 when using arrow: https://ursalabs.org/arrow-r-nightly/articles/fs.html#file-systems-that-emulate-s3 which support connections using a URI such as "s3://minioadmin:minioadmin@?scheme=http&endpoint_override=localhost%3A9000".

Another S3 client that allows for this "single environment variable" configuration of a connectionstring is "minio client", which supports usage like this "export MC_HOST_<alias>=https://<Access Key>:<Secret Key>@<YOUR-S3-ENDPOINT>"

mskyttner avatar Dec 20 '21 10:12 mskyttner

For others stumbling across this issue, just wanted to note (as @mskyttner's comment hints at above) that arrow's S3Filesystem now supports most S3 operations quite well, and in a way that works very seamlessly with MINIO https://arrow.apache.org/docs/r/articles/fs.html.

cboettig avatar Dec 20 '21 18:12 cboettig

Bump. feb-2023 this is still an issue. a workaround would be minio.s3 package that is getting outdated and no longer works with R 4.2.2

Wesseldr avatar Feb 15 '23 21:02 Wesseldr

+1 for the package to rely on the AWS_DEFAULT_REGION and/or AWS_REGION as default instead of us-east-1. Also in favour of some solution of the protocol issue.

goergen95 avatar Jun 26 '23 08:06 goergen95

FWIW, I've found it much easier to work directly with the minio client, whether I"m working with AWS S3, a MinIO system or other provider. I've written a R package wrapper for this, feedback/bug reports appreciated: https://github.com/cboettig/minioclient

This is a thin wrapper that sidesteps the many issues that arise when trying to implement the low-level S3-API over http requests, with all the pagination, xml parsing, signatures, etc that must then be handled (with no disrespect to the maintainers here -- as an early contributor to this package I appreciate how hard it is to deal with these low level things. But this repo has not seen commits for over 3 years).

cboettig avatar Jun 26 '23 17:06 cboettig