velox
velox copied to clipboard
Add support for S3 bucket config
Allow all hive.s3 options to be set on a per-bucket basis.
The bucket-specific option is set by replacing the hive.s3. prefix on an option
with hive.s3.bucket.BUCKETNAME., where BUCKETNAME is the name of the bucket.
When connecting to a bucket, all options explicitly set will override the base hive.s3. values.
These semantics are similar to the Apache Hadoop-Aws module.
https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html
Spark uses this for ETL workloads between 2 buckets.
Deploy Preview for meta-velox canceled.
| Name | Link |
|---|---|
| Latest commit | 877663d2af49f55ed04d60d308f252f024dbb0a4 |
| Latest deploy log | https://app.netlify.com/sites/meta-velox/deploys/6724fc5c7e38b700089c68a3 |
@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@kgpai merged this pull request in facebookincubator/velox@d76c05c7dd731b4e6269afca7e7b82498941131a.
Conbench analyzed the 1 benchmark run on commit d76c05c7.
There was 1 benchmark result indicating a performance regression:
- Commit Run on
GitHub-runner-8-coreat 2024-11-07 20:48:56Z
The full Conbench report has more details.
Thanks, @kgpai. CC: @zhouyuan, @FelixYBW You can use the following change in the Gluten ConfigExtractor.cc for all the configs. I will be happy to open a PR for the bucket-level config support after Velox is updated.
using namespace facebook::velox::filesystems;
- hiveConfMap[facebook::velox::connector::hive::HiveConfig::kS3IamRole] = iamRole;
+ hiveConfMap[S3Config::baseConfigKey(S3Config::Keys::kIamRole)] = iamRole;