aws.signature
aws.signature copied to clipboard
locate_credentials() is slow
Example from https://github.com/cloudyr/aws.s3/issues/166
Report from Vitalina Komashko:
system.time(aws.s3::copy_object(from_object = from_object[1], to_object = to_object[1],
from_bucket = from_bucket, to_bucket = to_bucket, verbose = TRUE))
Using Environment Variable 'AWS_ACCESS_KEY_ID' for AWS Access Key ID
Using Environment Variable 'AWS_SECRET_ACCESS_KEY' for AWS Secret Access Key
Checking bucket region using get_location('<REDACTED>')
Executing request using bucket region us-west-1
S3 Request URL: https://s3-us-west-1.amazonaws.com/<REDACTED>
Executing request with AWS credentials
Parsing AWS API response
Success: (200) OK
user system elapsed
0.836 0.674 2437.777
This runs about 10x slower than the AWS CLI. Issue may be with locate_credentials()
:
/usr/bin/time -l aws s3 cp s3://<REDACTED> --profile corporate
copy: s3://<REDACTED> to s3://<REDACTED>
263.67 real 2.24 user 0.65 sys
Ideas for refactoring:
- [ ] convert
locate_credentials()
into a set of smaller functions called sequentially if previous function returns NULL - [x]
locate_credentials()
is called twice in a lot of packages, once explicitly and once implicitly. Disabling second call (by adding alocate = FALSE
tosignature_v4_auth()
will prevent the second search - [ ] add "requests" functionality that builds
GET()
,POST()
, etc calls and signs them automatically.
We also get a decent improvement on signature_v4_auth()
from v0.4.4:
> bench::mark(
+ is.list(signature_v4_auth("foo", "bar", verb = "GET", service = "s3", action = "/", request_body = "", canonical_headers = list("x-amz-region" = "foo")))
+ )
# A tibble: 1 x 14
expression min mean median max `itr/sec` mem_alloc n_gc n_itr total_time result memory time gc
<chr> <bch:> <bch:> <bch:> <bch:> <dbl> <bch:byt> <dbl> <int> <bch:tm> <list> <list> <lis> <lis>
1 "is.list(~ 15.1ms 23.9ms 25.5ms 35.6ms 41.8 35.1KB 1 20 478ms <lgl ~ <Rpro~ <bch~ <tib~
to v0.4.5:
> bench::mark(
+ is.list(signature_v4_auth("foo", "bar", verb = "GET", service = "s3", action = "/", request_body = "", canonical_headers = list("x-amz-region" = "foo"))),
+ is.list(signature_v4_auth("foo", "bar", verb = "GET", service = "s3", action = "/", request_body = "", canonical_headers = list("x-amz-region" = "foo"), force_credentials = TRUE))
+ )
# A tibble: 2 x 14
expression min mean median max `itr/sec` mem_alloc n_gc n_itr total_time result memory time gc
<chr> <bch:> <bch:> <bch:> <bch:> <dbl> <bch:byt> <dbl> <int> <bch:tm> <list> <list> <lis> <lis>
1 "is.list(~ 2.52ms 2.85ms 2.79ms 3.85ms 350. 1.38MB 9 151 431ms <lgl ~ <Rpro~ <bch~ <tib~
2 "is.list(~ 2.48ms 2.84ms 2.73ms 8.25ms 352. 19.88KB 9 149 423ms <lgl ~ <Rpro~ <bch~ <tib~
I had an issue with very slow loads, permeating even calling library(aws.s3)
, since that would load aws.signature
. On my end, the issue, was that check_ec2()
is the bottleneck, since it makes a URL call-out that I guess gets retried a bunch of times? And this is called in .onLoad
. Either way, it was taking a long time. I'm not sure if other people have this issue, as I only got it as long as aws.ec2metadata
was installed on a non-ec2 instances (for testing), which was of course a terrible idea.
However, removing aws.ec2metadata
didn't solve all my issues. I'll make a new issue hopefully outlining what I think is happening at least in my case.
I opened a ticket within the aws.ec2metadata repo here for the slow is_ec2()
call: https://github.com/cloudyr/aws.ec2metadata/issues/7
Hopefully with the 0.2.0 release of aws.ec2metadata
this should be improved. The timeout for checking metadata has been decreased from the curl default of 10s to 1s (to bring it inline with boto3).
Hey @jon-mago - that timeout will help. What would be better (for us) is if aws.signature followed the typical credential precedence structure (check .aws/credentials before Instance/Container IAM Roles).
Seems pretty consistent:
- CLI: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
- Python: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
- Java: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html
- Go: https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/configuring-sdk.html
Thoughts on why CloudyR is different, and maybe offerening the regular pattern as an option?
I'm not sure why it's different. I suspect the answer is merely "because it was written that way quite a while ago".
While it would be a kind of breaking change to "put the order right", I'm not averse to that, as I suspect it will break few things in practice. I will have a think on how to offer the standard order as an option, so it can be a more gradual transition.