velox icon indicating copy to clipboard operation
velox copied to clipboard

Add RetryStrategy for S3 file system

Open yma11 opened this issue 9 months ago • 9 comments

This PR add RetryStrategy support for s3 file system, thus it will retry when connection fails. It also upgrade aws sdk to 1.11.321 which supports AdaptiveRetryStrategy which user may choose to use.

For RetryStrategy, 2 configs are added:

hive.s3.max-attempts: Maximum attempts for connections to a single http client. hive.s3.retry-mode: 'standard', 'adaptive' or legacy, client will be created w/o retrystrategy if it's empty.

yma11 avatar May 07 '24 08:05 yma11

Deploy Preview for meta-velox canceled.

Name Link
Latest commit e927721d7a9587d0ed50c5c784bb3ddbce2684ae
Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6666ac033cbfe600090421b5

netlify[bot] avatar May 07 '24 08:05 netlify[bot]

@majetideepak Can you help review this PR? cc @FelixYBW

yma11 avatar May 07 '24 08:05 yma11

@yma11 When the call client_->HeadObject or client_->GetObject fail, can we get the the retry number from outcome? If so, let's print the number in error message. So user can know it does be retried and may increase retry number.

FelixYBW avatar May 08 '24 03:05 FelixYBW

@yma11 I will approve again once @FelixYBW comment is addressed.

Thanks @majetideepak, I have updated. Can you take a look again? There is a build failure in CI, which is caused by the old version aws sdk doesn't have GetRetryCount() API and seems version upgrade in this PR can't be reflected immediately.

yma11 avatar May 08 '24 12:05 yma11

There is a build failure in CI, which is caused by the old version aws sdk doesn't have GetRetryCount() API

@yma11 Let's update the library first in a separate PR (changes inside setup-adapters.sh). That will give us the newer CI images with the updated library.

majetideepak avatar May 08 '24 16:05 majetideepak

There is a build failure in CI, which is caused by the old version aws sdk doesn't have GetRetryCount() API

@yma11 Let's update the library first in a separate PR (changes inside setup-adapters.sh). That will give us the newer CI images with the updated library.

Thanks for suggestion. I have created 9756 for version upgrade. Will update this PR once it's merged.

yma11 avatar May 09 '24 01:05 yma11

@majetideepak can you approve this PR again? Thanks.

yma11 avatar May 14 '24 00:05 yma11

@yma11 some minor comments. Were you able to test this change on your local setup?

I don't have local env for this test for now. I will add corresponding configs at Gluten side and then test it in AWS.

yma11 avatar May 14 '24 11:05 yma11

I don't have local env for this test for now. I will add corresponding configs at Gluten side and then test it in AWS.

@yma11 thanks! Can you please confirm and update here? We cannot add CI tests for some of these changes, but the expectation is that the author tests the changes on their end.

majetideepak avatar May 14 '24 16:05 majetideepak

@yma11 thanks! Can you please confirm and update here? We cannot add CI tests for some of these changes, but the expectation is that the author tests the changes on their end.

Confirmed it works. Retry number is passed to S3 and the query passed.

FelixYBW avatar May 25 '24 02:05 FelixYBW

@majetideepak can you help ping anyone who can merge this PR? Thanks.

yma11 avatar Jun 03 '24 01:06 yma11

@pedroerp can you please help merge this? Thanks!

majetideepak avatar Jun 03 '24 09:06 majetideepak

@yma11 I pinged Pedro again. Can you please rebase with main? Sorry for the delay.

majetideepak avatar Jun 10 '24 04:06 majetideepak

@yma11 I pinged Pedro again. Can you please rebase with main? Sorry for the delay.

Done. Really thanks for driving merge.

yma11 avatar Jun 10 '24 10:06 yma11

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Jun 10 '24 17:06 facebook-github-bot

@kgpai merged this pull request in facebookincubator/velox@8c3630b35ea6aeeaca9d333a82a6fa79728b281a.

facebook-github-bot avatar Jun 15 '24 10:06 facebook-github-bot

Conbench analyzed the 1 benchmark run on commit 8c3630b3.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

conbench-facebook[bot] avatar Jun 15 '24 11:06 conbench-facebook[bot]