tensorstore icon indicating copy to clipboard operation
tensorstore copied to clipboard

Further S3 Support Umbrella Issue

Open sjperkins opened this issue 1 year ago • 8 comments

This issue tracks remaining issues from the TODO list here: https://github.com/google/tensorstore/pull/91#issue-1649705676

Related:

  • https://github.com/google/tensorstore/issues/46
  • https://github.com/google/tensorstore/pull/91

KVStore

  • [ ] Versioned Bucket Support. From a high-level perspective might involve combining the x-amz-version-id and etag headers in StorageGenerations.
  • [ ] S3 specific rate-limiting (per endpoint?). Current S3 rate limiting is based on the GCS rate limiting code. Relevant documentation
    • https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html
    • https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/
    • https://docs.aws.amazon.com/sdkref/latest/guide/feature-retry-behavior.html

Testing

  • [x] ~~Upgrade to localstack 2.3 (release targeted for 26th Sep 2023)~~
    • ~~tensorstore::internal::TestKeyValueReadWriteOps currently succeeds locally on localstack/localstack:latest image (c.f. https://github.com/localstack/localstack/issues/9076#issuecomment-1711974643)~~
  • [ ] ~~Install localstack in github actions~~
  • [x] Moto as S3 testharness. https://github.com/google/tensorstore/commit/8a7f109b4b60b7371929a2c7467516e10671d8f3, https://github.com/google/tensorstore/commit/81aef761d36cff582169fe5b2ab452bcccb6079a
  • [ ] Benchmarking See https://github.com/google/tensorstore/pull/91#discussion_r1248172227

Authorisation

  • [x] #119
  • [x] #115.
    • https://github.com/google/tensorstore/commit/b9b90a21e8c8f7aab92f7699746d70f58890e01f

Logging

sjperkins avatar Sep 21 '23 08:09 sjperkins

As of https://github.com/google/tensorstore/commit/81aef761d36cff582169fe5b2ab452bcccb6079a we should be in a place where the "//tensorstore/kvstore/s3:localstack_test" will run using a hermetic moto (https://github.com/getmoto/moto) server.

We choose to use moto over localstack since it appears to have a smaller and somewhat less conflicting dependency tree.

laramiel avatar Nov 01 '23 23:11 laramiel

Thanks, have merged master into #119.

In terms of retry functionality, the following looks like a suitable reference implementation:

  • https://github.com/aws/aws-sdk-go/blob/4886ffca0e4f8f35e8abad033e87f0aa305b33cf/aws/client/default_retryer.go#L16
  • https://github.com/aws/aws-sdk-go/blob/4886ffca0e4f8f35e8abad033e87f0aa305b33cf/aws/client/default_retryer.go#L78-L123

I might have time to look at this in the second half of November, or, early December, unless there's something else that might be useful to look at first?

sjperkins avatar Nov 02 '23 08:11 sjperkins

I have linked "s3" into the default kvstore set; no special changes are required now.

laramiel avatar Nov 10 '23 00:11 laramiel

Hello, just dropping a note to say that my next window of opportunity for s3 work will be after March 2024,

sjperkins avatar Jan 25 '24 08:01 sjperkins

There was, at least, one issue related to auth on S3. I think that we may want to investigate using the AWS sdk (https://github.com/aws/aws-sdk-cpp/tree/main) for, at a minimum, auth.

  • For auth, we'd need to check that it could be integrated into our requests flow.
  • We'd need bazel and bazel_to_cmake configs to work; it would be possible to start with the tensorflow-io (https://github.com/tensorflow/io) bazel files; there appears to be ~5 dependencies required.
  • There does appear to be a mismatch in apis regarding cancellation; that seems less important for now.
  • Verify that it is not "too large".

laramiel avatar Feb 08 '24 20:02 laramiel

There was, at least, one issue related to auth on S3. I think that we may want to investigate using the AWS sdk (https://github.com/aws/aws-sdk-cpp/tree/main) for, at a minimum, auth.

Thanks for the headsup. I assume https://github.com/google/tensorstore/issues/135 is the issue? I just want to check, from scanning the issue there was a obvious, but easily fixed argument order bug (apologies looks like that was on me). Has this resolved the problem or is there an orthogonal issue related to incorporating KMS keys in the AWS4 signature?

If so, I can see why deferring to the SDK would be appealing -- the auth surface area is probably quite large.

Perhaps, one way to think about the heaviness of aws-sdk-cpp is to install the appropriate parts using vcpkg:

$ ./vcpkg install aws-sdk-cpp[core,s3,s3-encryption,kms]
...
$ cd installed/x64-linunx/lib
$ du -hsc lib*
424K	libaws-c-auth.a
124K	libaws-c-cal.a
496K	libaws-c-common.a
16K	libaws-c-compression.a
152K	libaws-c-event-stream.a
44K	libaws-checksums.a
668K	libaws-c-http.a
484K	libaws-c-io.a
540K	libaws-c-mqtt.a
3.4M	libaws-cpp-sdk-core.a
1.5M	libaws-cpp-sdk-kms.a
4.2M	libaws-cpp-sdk-s3.a
576K	libaws-cpp-sdk-s3-encryption.a
1.3M	libaws-crt-cpp.a
312K	libaws-c-s3.a
164K	libaws-c-sdkutils.a
11M	libcrypto.a
1.4M	libcurl.a
2.4M	libs2n.a
1.3M	libssl.a
144K	libz.a
30M	total

It may be possible to build the aws-sdk-cpp with boringssl instead of openssl as a dependency. Then, purely AWS code seems to be about 15M in total

$ du -hsc libaws-c*
424K	libaws-c-auth.a
124K	libaws-c-cal.a
496K	libaws-c-common.a
16K	libaws-c-compression.a
152K	libaws-c-event-stream.a
44K	libaws-checksums.a
668K	libaws-c-http.a
484K	libaws-c-io.a
540K	libaws-c-mqtt.a
3.4M	libaws-cpp-sdk-core.a
1.5M	libaws-cpp-sdk-kms.a
4.2M	libaws-cpp-sdk-s3.a
576K	libaws-cpp-sdk-s3-encryption.a
1.3M	libaws-crt-cpp.a
312K	libaws-c-s3.a
164K	libaws-c-sdkutils.a
15M	total

Maybe a more accurate estimate could be obtained by building a test binary that produces an Authorization header using AWSAuthV4Signer (for instance).

I may not get to this before end March/start April due to other commitments.

sjperkins avatar Feb 09 '24 08:02 sjperkins

If you think a videocon would be useful my contact details are here https://github.com/ratt-ru/arcae/blob/8c323116105d811f9718f258a09b52b68e7a8150/pyproject.toml#L5.

sjperkins avatar Feb 09 '24 12:02 sjperkins

I think cribbing from this is a good starting point.

  • https://github.com/tensorflow/io/blob/master/third_party/aws-sdk-cpp.BUILD

The dependencies are

  • curl (already in the tree)
  • boringssl (already in the tree)
  • https://github.com/tensorflow/io/blob/master/third_party/aws-c-event-stream.BUILD
  • https://github.com/tensorflow/io/blob/master/third_party/aws-c-common.BUILD
  • https://github.com/tensorflow/io/blob/master/third_party/aws-checksums.BUILD

That would be enough to get auth, probably. Then we could theoretically use any of the AWS auth services.

laramiel avatar Feb 09 '24 18:02 laramiel