velox
velox copied to clipboard
Add RetryStrategy for S3 file system
This PR add RetryStrategy
support for s3 file system, thus it will retry when connection fails. It also upgrade aws sdk to 1.11.321
which supports AdaptiveRetryStrategy
which user may choose to use.
For RetryStrategy
, 2 configs are added:
hive.s3.max-attempts
: Maximum attempts for connections to a single http client.
hive.s3.retry-mode
: 'standard', 'adaptive' or legacy
, client will be created w/o retrystrategy if it's empty.
Deploy Preview for meta-velox canceled.
Name | Link |
---|---|
Latest commit | e927721d7a9587d0ed50c5c784bb3ddbce2684ae |
Latest deploy log | https://app.netlify.com/sites/meta-velox/deploys/6666ac033cbfe600090421b5 |
@majetideepak Can you help review this PR? cc @FelixYBW
@yma11 When the call client_->HeadObject or client_->GetObject fail, can we get the the retry number from outcome? If so, let's print the number in error message. So user can know it does be retried and may increase retry number.
@yma11 I will approve again once @FelixYBW comment is addressed.
Thanks @majetideepak, I have updated. Can you take a look again? There is a build failure in CI, which is caused by the old version aws sdk doesn't have GetRetryCount()
API and seems version upgrade in this PR can't be reflected immediately.
There is a build failure in CI, which is caused by the old version aws sdk doesn't have
GetRetryCount()
API
@yma11 Let's update the library first in a separate PR (changes inside setup-adapters.sh). That will give us the newer CI images with the updated library.
There is a build failure in CI, which is caused by the old version aws sdk doesn't have
GetRetryCount()
API@yma11 Let's update the library first in a separate PR (changes inside setup-adapters.sh). That will give us the newer CI images with the updated library.
Thanks for suggestion. I have created 9756 for version upgrade. Will update this PR once it's merged.
@majetideepak can you approve this PR again? Thanks.
@yma11 some minor comments. Were you able to test this change on your local setup?
I don't have local env for this test for now. I will add corresponding configs at Gluten side and then test it in AWS.
I don't have local env for this test for now. I will add corresponding configs at Gluten side and then test it in AWS.
@yma11 thanks! Can you please confirm and update here? We cannot add CI tests for some of these changes, but the expectation is that the author tests the changes on their end.
@yma11 thanks! Can you please confirm and update here? We cannot add CI tests for some of these changes, but the expectation is that the author tests the changes on their end.
Confirmed it works. Retry number is passed to S3 and the query passed.
@majetideepak can you help ping anyone who can merge this PR? Thanks.
@pedroerp can you please help merge this? Thanks!
@yma11 I pinged Pedro again. Can you please rebase with main? Sorry for the delay.
@yma11 I pinged Pedro again. Can you please rebase with main? Sorry for the delay.
Done. Really thanks for driving merge.
@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@kgpai merged this pull request in facebookincubator/velox@8c3630b35ea6aeeaca9d333a82a6fa79728b281a.
Conbench analyzed the 1 benchmark run on commit 8c3630b3
.
There were no benchmark performance regressions. 🎉
The full Conbench report has more details.