s5cmd
s5cmd copied to clipboard
storage: migrating to aws-sdk-go-v2
This PR migrates s5cmd
from aws-sdk-for-go
to aws-sdk-for-go-v2
.
Useful links for future reference:
Changed files:
Major changes:
-
s3.go
-
s3_test.go
-
util_test.go
-
mock_s3.go
Minor fixes:
-
cat_test.go
-
cp_test.go
-
mb_test.go
-
rb_test.go
-
run_test.go
-
log.go
Important changes:
-
Sessions have been removed in aws-sdk-v2. Because of this,
s5cmd
will not have sessionCache anymore. Instead of that, it will haveclientCache
which in the essence is the same. - The environment variable
AWS_SDK_LOAD_CONFIG
is not used anymore by the sdk. However,s5cmd
will still not load from config files if this variable is set to 0 explicitly. Otherwise, it will load from default config files. - There is no
s3iface.S3API
in v2. Instead of s3iface.S3API, s5cmd will have its own s3Client interface. - As
session
structure has been changed, unit tests also require changes. Instead of unit.session, mockgen will be used to mocks3Client
interface and middleware will be used for some other tests such asTestS3Retry
. - There is no
CredentialsChainVerboseErrors
setting in new sdk. There is an issue related to this here.
Changes worth to mention:
- There is no
WithDisableRestProtocolURICleaning
setting anymore as v2 does not do any cleaning or url joining. - New SDK support many features for retry behavior. It might be useful to add
Backoff
and/orRateLimiter
as an additional optional value in the future.
Here is the benchmark results comparing master with this PR:
Benchmark summary:
Scenarios | File Size | File Count |
---|---|---|
small files | 1M | 10000 |
large file | 10G | 1 |
very large file | 300G | 1 |
Scenario | Summary |
---|---|
upload small files | 'PR:478' ran 1.01 ± 0.02 times faster than 'master' |
download small files | 'PR:478' ran 1.00 ± 0.01 times faster than 'master' |
remove small files | 'master' ran 1.05 ± 0.41 times faster than 'PR:478' |
upload large file | 'PR:478' ran 1.18 ± 0.23 times faster than 'master' |
download large file | 'master' ran 1.05 ± 0.08 times faster than 'PR:478' |
remove large file | 'master' ran 1.21 ± 0.39 times faster than 'PR:478' |
upload very large file | 'PR:478' ran 1.01 times faster than 'master' |
download very large file | 'master' ran 1.02 times faster than 'PR:478' |
remove very large file | 'PR:478' ran 1.13 times faster than 'master' |
Detailed summary:
Scenario | Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|---|
upload small files | PR:478 |
9.117 ± 0.155 | 8.848 | 9.337 | 1.00 |
upload small files | master |
9.252 ± 0.160 | 9.084 | 9.483 | 1.01 ± 0.02 |
download small files | PR:478 |
79.992 ± 0.091 | 79.879 | 80.177 | 1.00 |
download small files | master |
79.993 ± 0.462 | 79.096 | 81.028 | 1.00 ± 0.01 |
remove small files | PR:478 |
2.603 ± 0.435 | 2.308 | 3.245 | 1.05 ± 0.41 |
remove small files | master |
2.470 ± 0.878 | 2.012 | 3.787 | 1.00 |
upload large file | PR:478 |
10.093 ± 1.491 | 9.043 | 14.222 | 1.00 |
upload large file | master |
11.876 ± 1.486 | 10.730 | 15.787 | 1.18 ± 0.23 |
download large file | PR:478 |
27.689 ± 1.378 | 25.979 | 30.803 | 1.05 ± 0.08 |
download large file | master |
26.452 ± 1.667 | 24.891 | 29.375 | 1.00 |
remove large file | PR:478 |
0.157 ± 0.029 | 0.122 | 0.210 | 1.21 ± 0.39 |
remove large file | master |
0.130 ± 0.034 | 0.090 | 0.220 | 1.00 |
upload very large file | PR:478 |
270.462 | 270.462 | 270.462 | 1.00 |
upload very large file | master |
272.473 | 272.473 | 272.473 | 1.01 |
download very large file | PR:478 |
2538.727 | 2538.727 | 2538.727 | 1.02 |
download very large file | master |
2501.010 | 2501.010 | 2501.010 | 1.00 |
remove very large file | PR:478 |
1.011 | 1.011 | 1.011 | 1.00 |
remove very large file | master |
1.145 | 1.145 | 1.145 | 1.13 |
Some updates about this PR:
- aws-sdk-go-v2 doesn't natively work for google cloud. The issue can be seen here. A workaround is possible by removing the
content-encoding
but this may cause some unwanted behavior like not being able to compress the content. GCS reads that header and decides to do content-encoding or not. Refer to here. - Decision needs to be made on how to continue with this, either removing
content-encoding
or waiting until gcs support aws-sdk-go-v2.
@boraberke Thanks for the PR and your comments. Closing this PR because of https://github.com/peak/s5cmd/pull/478#issuecomment-1249311139