s5cmd icon indicating copy to clipboard operation
s5cmd copied to clipboard

storage: migrating to aws-sdk-go-v2

Open boraberke opened this issue 2 years ago • 2 comments

This PR migrates s5cmd from aws-sdk-for-go to aws-sdk-for-go-v2.

Useful links for future reference:

  1. Github: aws-sdk-for-go-v2
  2. Migration Guide
  3. Developer Guide

Changed files:

Major changes:
  1. s3.go
  2. s3_test.go
  3. util_test.go
  4. mock_s3.go
Minor fixes:
  1. cat_test.go
  2. cp_test.go
  3. mb_test.go
  4. rb_test.go
  5. run_test.go
  6. log.go

Important changes:

  1. Sessions have been removed in aws-sdk-v2. Because of this, s5cmd will not have sessionCache anymore. Instead of that, it will have clientCache which in the essence is the same.
  2. The environment variable AWS_SDK_LOAD_CONFIG is not used anymore by the sdk. However, s5cmd will still not load from config files if this variable is set to 0 explicitly. Otherwise, it will load from default config files.
  3. There is no s3iface.S3API in v2. Instead of s3iface.S3API, s5cmd will have its own s3Client interface.
  4. As session structure has been changed, unit tests also require changes. Instead of unit.session, mockgen will be used to mock s3Client interface and middleware will be used for some other tests such as TestS3Retry.
  5. There is no CredentialsChainVerboseErrors setting in new sdk. There is an issue related to this here.

Changes worth to mention:

  1. There is no WithDisableRestProtocolURICleaning setting anymore as v2 does not do any cleaning or url joining.
  2. New SDK support many features for retry behavior. It might be useful to add Backoff and/or RateLimiter as an additional optional value in the future.

boraberke avatar Aug 01 '22 08:08 boraberke

Here is the benchmark results comparing master with this PR:

Benchmark summary:

Scenarios File Size File Count
small files 1M 10000
large file 10G 1
very large file 300G 1
Scenario Summary
upload small files 'PR:478' ran 1.01 ± 0.02 times faster than 'master'
download small files 'PR:478' ran 1.00 ± 0.01 times faster than 'master'
remove small files 'master' ran 1.05 ± 0.41 times faster than 'PR:478'
upload large file 'PR:478' ran 1.18 ± 0.23 times faster than 'master'
download large file 'master' ran 1.05 ± 0.08 times faster than 'PR:478'
remove large file 'master' ran 1.21 ± 0.39 times faster than 'PR:478'
upload very large file 'PR:478' ran 1.01 times faster than 'master'
download very large file 'master' ran 1.02 times faster than 'PR:478'
remove very large file 'PR:478' ran 1.13 times faster than 'master'

Detailed summary:

Scenario Command Mean [s] Min [s] Max [s] Relative
upload small files PR:478 9.117 ± 0.155 8.848 9.337 1.00
upload small files master 9.252 ± 0.160 9.084 9.483 1.01 ± 0.02
download small files PR:478 79.992 ± 0.091 79.879 80.177 1.00
download small files master 79.993 ± 0.462 79.096 81.028 1.00 ± 0.01
remove small files PR:478 2.603 ± 0.435 2.308 3.245 1.05 ± 0.41
remove small files master 2.470 ± 0.878 2.012 3.787 1.00
upload large file PR:478 10.093 ± 1.491 9.043 14.222 1.00
upload large file master 11.876 ± 1.486 10.730 15.787 1.18 ± 0.23
download large file PR:478 27.689 ± 1.378 25.979 30.803 1.05 ± 0.08
download large file master 26.452 ± 1.667 24.891 29.375 1.00
remove large file PR:478 0.157 ± 0.029 0.122 0.210 1.21 ± 0.39
remove large file master 0.130 ± 0.034 0.090 0.220 1.00
upload very large file PR:478 270.462 270.462 270.462 1.00
upload very large file master 272.473 272.473 272.473 1.01
download very large file PR:478 2538.727 2538.727 2538.727 1.02
download very large file master 2501.010 2501.010 2501.010 1.00
remove very large file PR:478 1.011 1.011 1.011 1.00
remove very large file master 1.145 1.145 1.145 1.13

boraberke avatar Aug 11 '22 09:08 boraberke

Some updates about this PR:

  • aws-sdk-go-v2 doesn't natively work for google cloud. The issue can be seen here. A workaround is possible by removing the content-encoding but this may cause some unwanted behavior like not being able to compress the content. GCS reads that header and decides to do content-encoding or not. Refer to here.
  • Decision needs to be made on how to continue with this, either removing content-encoding or waiting until gcs support aws-sdk-go-v2.

boraberke avatar Sep 16 '22 12:09 boraberke

@boraberke Thanks for the PR and your comments. Closing this PR because of https://github.com/peak/s5cmd/pull/478#issuecomment-1249311139

sonmezonur avatar Mar 27 '23 07:03 sonmezonur