aws.signature icon indicating copy to clipboard operation
aws.signature copied to clipboard

locate_credentials() is slow

Open leeper opened this issue 6 years ago • 7 comments

Example from https://github.com/cloudyr/aws.s3/issues/166

Report from Vitalina Komashko:

system.time(aws.s3::copy_object(from_object = from_object[1], to_object = to_object[1],
from_bucket = from_bucket, to_bucket = to_bucket, verbose = TRUE))
Using Environment Variable 'AWS_ACCESS_KEY_ID' for AWS Access Key ID
Using Environment Variable 'AWS_SECRET_ACCESS_KEY' for AWS Secret Access Key

Checking bucket region using get_location('<REDACTED>')
Executing request using bucket region us-west-1
S3 Request URL: https://s3-us-west-1.amazonaws.com/<REDACTED>
Executing request with AWS credentials
Parsing AWS API response
Success: (200) OK
    user   system  elapsed
   0.836    0.674 2437.777

This runs about 10x slower than the AWS CLI. Issue may be with locate_credentials():

/usr/bin/time -l aws s3 cp s3://<REDACTED> --profile corporate
copy: s3://<REDACTED> to s3://<REDACTED>

      263.67 real         2.24 user         0.65 sys

leeper avatar Jul 28 '18 07:07 leeper

Ideas for refactoring:

  • [ ] convert locate_credentials() into a set of smaller functions called sequentially if previous function returns NULL
  • [x] locate_credentials() is called twice in a lot of packages, once explicitly and once implicitly. Disabling second call (by adding a locate = FALSE to signature_v4_auth() will prevent the second search
  • [ ] add "requests" functionality that builds GET(), POST(), etc calls and signs them automatically.

leeper avatar Aug 07 '18 09:08 leeper

We also get a decent improvement on signature_v4_auth() from v0.4.4:

> bench::mark(
+   is.list(signature_v4_auth("foo", "bar", verb = "GET", service = "s3", action = "/", request_body = "", canonical_headers = list("x-amz-region" = "foo")))
+ )
# A tibble: 1 x 14
  expression    min   mean median    max `itr/sec` mem_alloc  n_gc n_itr total_time result memory time  gc   
  <chr>      <bch:> <bch:> <bch:> <bch:>     <dbl> <bch:byt> <dbl> <int>   <bch:tm> <list> <list> <lis> <lis>
1 "is.list(~ 15.1ms 23.9ms 25.5ms 35.6ms      41.8    35.1KB     1    20      478ms <lgl ~ <Rpro~ <bch~ <tib~

to v0.4.5:

> bench::mark(
+   is.list(signature_v4_auth("foo", "bar", verb = "GET", service = "s3", action = "/", request_body = "", canonical_headers = list("x-amz-region" = "foo"))),
+   is.list(signature_v4_auth("foo", "bar", verb = "GET", service = "s3", action = "/", request_body = "", canonical_headers = list("x-amz-region" = "foo"), force_credentials = TRUE))
+ )
# A tibble: 2 x 14
  expression    min   mean median    max `itr/sec` mem_alloc  n_gc n_itr total_time result memory time  gc   
  <chr>      <bch:> <bch:> <bch:> <bch:>     <dbl> <bch:byt> <dbl> <int>   <bch:tm> <list> <list> <lis> <lis>
1 "is.list(~ 2.52ms 2.85ms 2.79ms 3.85ms      350.    1.38MB     9   151      431ms <lgl ~ <Rpro~ <bch~ <tib~
2 "is.list(~ 2.48ms 2.84ms 2.73ms 8.25ms      352.   19.88KB     9   149      423ms <lgl ~ <Rpro~ <bch~ <tib~

leeper avatar Aug 07 '18 14:08 leeper

I had an issue with very slow loads, permeating even calling library(aws.s3), since that would load aws.signature. On my end, the issue, was that check_ec2() is the bottleneck, since it makes a URL call-out that I guess gets retried a bunch of times? And this is called in .onLoad. Either way, it was taking a long time. I'm not sure if other people have this issue, as I only got it as long as aws.ec2metadata was installed on a non-ec2 instances (for testing), which was of course a terrible idea.

However, removing aws.ec2metadata didn't solve all my issues. I'll make a new issue hopefully outlining what I think is happening at least in my case.

colman-humphrey avatar Apr 19 '19 20:04 colman-humphrey

I opened a ticket within the aws.ec2metadata repo here for the slow is_ec2() call: https://github.com/cloudyr/aws.ec2metadata/issues/7

micwalk avatar May 03 '19 18:05 micwalk

Hopefully with the 0.2.0 release of aws.ec2metadata this should be improved. The timeout for checking metadata has been decreased from the curl default of 10s to 1s (to bring it inline with boto3).

jon-mago avatar Jul 16 '19 08:07 jon-mago

Hey @jon-mago - that timeout will help. What would be better (for us) is if aws.signature followed the typical credential precedence structure (check .aws/credentials before Instance/Container IAM Roles).

Seems pretty consistent:

  • CLI: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
  • Python: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
  • Java: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html
  • Go: https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/configuring-sdk.html

Thoughts on why CloudyR is different, and maybe offerening the regular pattern as an option?

jyoungs avatar Jul 31 '19 23:07 jyoungs

I'm not sure why it's different. I suspect the answer is merely "because it was written that way quite a while ago".

While it would be a kind of breaking change to "put the order right", I'm not averse to that, as I suspect it will break few things in practice. I will have a think on how to offer the standard order as an option, so it can be a more gradual transition.

jon-mago avatar Aug 01 '19 09:08 jon-mago