hadoop icon indicating copy to clipboard operation
hadoop copied to clipboard

HADOOP-19205. S3A: initialization/close slower than with v1 SDK

Open steveloughran opened this issue 8 months ago • 17 comments

HADOOP-19205

Adds new ClientManager interface/impl which provides on-demand creation of sync and async s3 clients, s3 transfer manager, and in close() terminates these.

S3A FS is modified to

  • Create one of these and hand off to S3Store
  • Use the same ClientManager interface against S3Store to demand-create the services.
  • only create the async client as part of the transfer manager creation, during rename.
  • stats on client creation count/duration are recorded.
  • statistics on the time to initialize and shutdown the s3afs is collected in IOStatistics for reporting.

The s3client is still created in FileSystem.initialize(), it is the async one which is on demand.

No attempt to do async creation of the s3 client in initialize, though it could offer marginal benefits, depending on the codepath.

Change-Id: I79a668aacd920048447485afed77df573a38cb37

How was this patch tested?

Relying on regression tests knowing that this codepath will be tested.

Some other tests will be needed. e.g

  • verify recurrent creation always returns same instance.
  • behaviour after close()

For code changes:

  • [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • [x] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • [ ] If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

steveloughran avatar Jun 17 '24 19:06 steveloughran