hadoop
hadoop copied to clipboard
HADOOP-18154. S3A Authentication to support WebIdentity
Description of PR
The PR addresses a requirement to comply with AWS security concept IAM roles for service accounts (IRSA) while operating a service that isn't based on Apache Spark and that runs inside Amazon Elastic Kubernetes Service (EKS).
The code change consists in adding a new credentials provider class org.apache.hadoop.fs.s3a.OIDCTokenCredentialsProvider
to the module hadoop-aws.
How was this patch tested?
No new unit-test or integration-test was created on-purpose. The patch was "only" tested based on Hadoop release 2.10.1, as part of our specific use-case based on Delta sharing service v0.4.0 along with the following Hadoop configuration (core-site.xml):
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.s3a.aws.credentials.provider</name>
<value>org.apache.hadoop.fs.s3a.OIDCTokenCredentialsProvider</value>
</property>
<property>
<name>fs.s3a.jwt.path</name>
<value>/var/run/secrets/eks.amazonaws.com/serviceaccount/token</value>
</property>
<property>
<name>fs.s3a.assumed.role.arn</name>
<value>my_iam_role_arn</value>
</property>
<property>
<name>fs.s3a.assumed.role.session.name</name>
<value>my_iam_session_name</value>
</property>
<property>
<name>fs.s3a.server-side-encryption-algorithm</name>
<value>SSE-KMS</value>
</property>
<property>
<name>fs.s3a.server-side-encryption.key</name>
<value>my_kms_key_id</value>
</property>
</configuration>
For code changes:
- [X] The title or this PR starts with the corresponding JIRA issue 'HADOOP-18154'
- [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
- [X] No new dependency was added to the code.
Looking at the WebIdentityTokenCredentialsProvider
I see that if it doesn't get the parameters then it will fall back to environment variables. We absolutely do not want to be picking up env vars as it will only create support issues where configurations only work on a certain machines. (actually, we can ignore the session name settings as they are harmless)
I'm going to propose we go with @dannycjones's suggestion and support the whole set of values and have the prefix fs.s3a.webidentity
for all of them.
for the arn, we could have a property fs.s3a.webidentity.role.arn
but, what should we do if it wasn't set?
- fail to initialize
- have that null value force the env var lookup.
I don't see any way to a completely block the environment variable resolution, which is a pain.
I also see in the internal Library classes that sometimes roles are set up with an external ID, but it is not possible here. Is that an issue?
so as well as authing with a webidentidy token, we could use https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRoleWithWebIdentity.html to get role credentials for up to 12h. which could go into a delegation token.
- @mehakmeet
we need this feature, can somebody please help?
Hey @steveloughran. Could you please provide status update on this PR, so you plan to merge it, or it's already outdated?
@insider89 not been any changes on this PR since it was last reviewed. if the author takes up the pr, addresses those issues etc then we can work on getting it in to hadoop.
I would suggest asking the author what the status is, and then working with them and other interested parties to get it into state where the reviewers are happy. I am not set up to test this, so a key role of those who need this is to verify the patch works.
@jclarysse - I am looking for this same capability - we're deploying trino / hive metastore (on top of Hadoop) in EKS, and of course service account -> iam role mappings do not work as is.
Is this PR merged anyplace where we can build and try out inside of hive metastore? Thanks.
note that hadoop-trunk will, once #5872 is in, move to aws sdk 2 only, with the other credential providers. There will be support for v1 credential providers, but only if the v1 aws sdk is explicitly added to the classspath
- a PR to add web identity to trunk based on the AWS SDK 2 code is welcome, with docs and tests; one using V1 classes not going to get in. sorry.
- I'm not sure yet whether we will ever do another release with a 1.x AWS SDK; depends on timetables, motivation etc. Ideally we will be shipping a hadoop version on the v2 sdk later this year.
Trunk is on aws v2 sdk now; it's where features go in. once something is in there a patch for branch-3.3 may be considered; that'd have to be a v1 implementation of the same feature so I'd be reluctant to add it as it would be very different.