velox icon indicating copy to clipboard operation
velox copied to clipboard

Add support for S3 bucket config

Open majetideepak opened this issue 1 year ago • 1 comments

Allow all hive.s3 options to be set on a per-bucket basis. The bucket-specific option is set by replacing the hive.s3. prefix on an option with hive.s3.bucket.BUCKETNAME., where BUCKETNAME is the name of the bucket. When connecting to a bucket, all options explicitly set will override the base hive.s3. values. These semantics are similar to the Apache Hadoop-Aws module. https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html Spark uses this for ETL workloads between 2 buckets.

majetideepak avatar Oct 22 '24 11:10 majetideepak

Deploy Preview for meta-velox canceled.

Name Link
Latest commit 877663d2af49f55ed04d60d308f252f024dbb0a4
Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6724fc5c7e38b700089c68a3

netlify[bot] avatar Oct 22 '24 11:10 netlify[bot]

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Nov 01 '24 21:11 facebook-github-bot

@kgpai merged this pull request in facebookincubator/velox@d76c05c7dd731b4e6269afca7e7b82498941131a.

facebook-github-bot avatar Nov 07 '24 20:11 facebook-github-bot

Conbench analyzed the 1 benchmark run on commit d76c05c7.

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details.

conbench-facebook[bot] avatar Nov 07 '24 20:11 conbench-facebook[bot]

Thanks, @kgpai. CC: @zhouyuan, @FelixYBW You can use the following change in the Gluten ConfigExtractor.cc for all the configs. I will be happy to open a PR for the bucket-level config support after Velox is updated.

using namespace facebook::velox::filesystems;

- hiveConfMap[facebook::velox::connector::hive::HiveConfig::kS3IamRole] = iamRole;
+ hiveConfMap[S3Config::baseConfigKey(S3Config::Keys::kIamRole)] = iamRole;

majetideepak avatar Nov 07 '24 21:11 majetideepak