incubator-uniffle
incubator-uniffle copied to clipboard
Support storing shuffle data to secured dfs cluster
What changes were proposed in this pull request?
Support storing shuffle data to secured HDFS cluster by spark job user's own permission in shuffle server side.
Why are the changes needed?
When using the storage type of MEMEORY_LOCALFILE_HDFS, we meet some problems on flushing shuffle data to secured HDFS clusters due to lacking credentials. To solve this, keytabs need to be distributed to all shuffle servers and login/refresh by crontab or other ways.
But this way is not secured, it’s better to flush data with corresponding HDFS users for data isolation.
We hope that:
- user A launched a Spark application, and send shuffle data to shuffle server.
- Shuffle server should write shuffle data with "user A" into HDFS.
- Reduce tasks (launched by user A) could read shuffle data in HDFS with user
- Otherwise, user A may not have the permission to read shuffle data written by shuffle server (another user) if it it is not owned by A.
More detail and motivation can be found in design doc: https://docs.google.com/document/d/1pIDwCwv8iwnXmFQeTZKA5tK55SRc_f0SPKYFONA5K70
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
Manual tests and unit tests.
@zuston To access the secured DFS cluster:
- For spark client, it can depend on spark's implementation, and read data according to delegation token.
- For Shuffle server, Hadoop conf can be updated with security enable when write data to HDFS What's the advantage to add HadoopAccessorProvider?
@colinmjj @jerqi
For spark client, it can depend on spark's implementation, and read data according to delegation token.
yes. There is no need to retrieve fs by ugi proxy user. The credentials have been fetched in spark driver side when starting.
For Shuffle server, Hadoop conf can be updated with security enable when write data to HDFS What's the advantage to add HadoopAccessorProvider?
Now shuffle server can’t access secured cluster due to lacking keytab.
To solve this, i introduced some config and then shuffle server can login and delegate user to write hdfs files by using the proxy user.
@colinmjj @jerqi
For spark client, it can depend on spark's implementation, and read data according to delegation token.
yes. There is no need to retrieve fs by ugi proxy user. The credentials have been fetched in spark driver side when starting.
For Shuffle server, Hadoop conf can be updated with security enable when write data to HDFS What's the advantage to add HadoopAccessorProvider?
Now shuffle server can’t access secured cluster due to lacking keytab.
To solve this, i introduced some config and then shuffle server can login and delegate user to write hdfs files by using the proxy user.
Have you try the following way to update Hadoop configuration:
- Option 1, in Shuffle server, conf prefix with
rss.server.hadoopwill be added to Hadoop configuration for data writing. - Option 2, Coordinator can manage the information for different Hdfs, you can check the related API
rpc fetchRemoteStorage(FetchRemoteStorageRequest)for more detail.
For the keytab problem, do you mean it can't be existed in Shuffle Server?
We may not need to support secured hdfs in Uniffle, the DFSClient would create ugi and spawn a renew thread automatically when the shuffler server gets a FileSystem. Normally you could use the hdfs cli to read/write files in secured hdfs if the kerberos-related config is right in the HADOOP_HOME. Could you check the code in DFSClient and UserGroupInformation and test without any code change using a secured hdfs. @zuston
I think what @zuston mentioned here is the case, for example:
- user A launched a Spark application, and send shuffle data to shuffle server.
- Shuffle server should write shuffle data with "user A" into HDFS.
- Reduce tasks (launched by user A) could read shuffle data in HDFS with user A.
Otherwise, user A may not have the permission to read shuffle data written by shuffle server (another user) if it it is not owned by A.
I'm not sure if I understand correctly or not? @zuston
@jerryshao You have caught my thoughts and thanks very much for explaining it in detail. Sorry I didn't explain clearly in advance.
So I think using the super user to delegate proxy user will be better in access control.
@jerryshao You have caught my thoughts and thanks very much for explaining it in detail. Sorry I didn't explain clearly in advance.
So I think using the super user to delegate proxy user will be better in access control.
Got you points, I think the feature is necessary to support proxy user when using secured HDFS
Now our Uniffle already support multiple HDFS. Is it possible to use the secured hdfs cluster and unsecured hdfds cluster? And some configuration may be added to RemoteStorageConfItem.https://github.com/apache/incubator-uniffle/blob/0f6a896efbd3a435de5e7a5d28843ecd05c38bde/proto/src/main/proto/Rss.proto#L360
Now our Uniffle already support multiple HDFS. Is it possible to use the secured hdfs cluster and unsecured hdfds cluster? And some configuration may be added to RemoteStorageConfItem.
https://github.com/apache/incubator-uniffle/blob/0f6a896efbd3a435de5e7a5d28843ecd05c38bde/proto/src/main/proto/Rss.proto#L360
I think the logged in ugi can get the unsecured hdfs filesystem. But this case is not involved in current test case, maybe added it later.
Codecov Report
Merging #53 (3a01bf3) into master (bc621c3) will increase coverage by
0.35%. The diff coverage is72.58%.
@@ Coverage Diff @@
## master #53 +/- ##
============================================
+ Coverage 58.00% 58.36% +0.35%
- Complexity 1231 1260 +29
============================================
Files 152 157 +5
Lines 8237 8384 +147
Branches 771 779 +8
============================================
+ Hits 4778 4893 +115
- Misses 3215 3240 +25
- Partials 244 251 +7
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...he/uniffle/client/impl/ShuffleWriteClientImpl.java | 25.37% <0.00%> (-0.59%) |
:arrow_down: |
| ...g/apache/uniffle/coordinator/CoordinatorUtils.java | 65.21% <ø> (+4.10%) |
:arrow_up: |
| ...pache/uniffle/server/ShuffleServerGrpcService.java | 1.00% <0.00%> (-0.01%) |
:arrow_down: |
| ...he/uniffle/server/buffer/ShuffleBufferManager.java | 82.25% <ø> (ø) |
|
| ...org/apache/uniffle/storage/common/HdfsStorage.java | 0.00% <0.00%> (ø) |
|
| ...storage/handler/impl/HdfsShuffleDeleteHandler.java | 0.00% <0.00%> (ø) |
|
| ...e/storage/handler/impl/HdfsShuffleReadHandler.java | 51.02% <ø> (ø) |
|
| ...rage/request/CreateShuffleWriteHandlerRequest.java | 0.00% <0.00%> (ø) |
|
| ...ache/uniffle/storage/util/ShuffleStorageUtils.java | 64.13% <ø> (+3.13%) |
:arrow_up: |
| .../apache/uniffle/coordinator/CoordinatorServer.java | 65.59% <33.33%> (-3.46%) |
:arrow_down: |
| ... and 21 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Updated. @jerqi.
Besides, i think i will introduce some ITs later.
What changes were proposed in this pull request?
Support storing shuffle data to secured dfs cluster
Why are the changes needed?
Now uniffle dont support visiting shuffle data to secured dfs cluster.
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
Manual tests.
Should we provide more detailed description?
@zuston To access secured DFS, I think the process should be:
- get remote storage from coordinator
- register remote storage to shuffle server with user info
- shuffle server stores the mapping about
appId -> user - when shuffle server write DFS with write handler, get user info and write data with proxy mode by
doAs()what do you think?
Access to secured clusters and isolation of user shuffle data are orthogonal, you can still do the isolation in a non-securere cluster (using the proxy user way in your pr). Could you please help to write the pr's description in more detail, and write a design doc to describe your motivation, design, workflow, compatibility, and test case in different scenarios since it's a huge and critical change. @zuston
@zuston To access secured DFS, I think the process should be:
- get remote storage from coordinator
- register remote storage to shuffle server with user info
- shuffle server stores the mapping about
appId -> user- when shuffle server write DFS with write handler, get user info and write data with proxy mode by
doAs()what do you think?
Got your points.
As I know the HdfsStorage is the shareable in the entire JVM. So now the user only will be added in CreateShuffleWriteHandlerRequest, which is bound to the app's w/r operations. I think the above changes have been reflected in the latest commit.
Access to secured clusters and isolation of user shuffle data are orthogonal, you can still do the isolation in a non-securere cluster (using the proxy user way in your pr).
It could use the proxy user to make the data written to unsecured dfs cluster, to make data isolation? Please let me know whether i am right.
Could you please help to write the pr's description in more detail, and write a design doc to describe your motivation, design, workflow, compatibility, and test case in different scenarios since it's a huge and critical change.
Do i need to introduce a new page to submit proposal like RFC? Or just write it in description? Or open a new issue to track and discuss.
@duanmeng
It could use the proxy user to make the data written to unsecured dfs cluster, to make data isolation? Please let me know whether i am right.
Yes, you're right, say user B can not access the path created by user A, they are both proxy user of the user running the uniflle cluster.
Do i need to introduce a new page to submit proposal like RFC? Or just write it in description? Or open a new issue to track and discuss.
I suggest you enrich your description and add your design doc's (google doc or tencent doc) link to it.
Updated the description and submit a simple proposal (https://docs.google.com/document/d/1pIDwCwv8iwnXmFQeTZKA5tK55SRc_f0SPKYFONA5K70). Feel free to discuss or comment more.
@jerqi @jerryshao @duanmeng @colinmjj
Besides if the uniffle community has a public proposal confluence wiki, it will be better, like flink: https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
If you have time, could u help review this proposal? @jerqi
If you have time, could u help review this proposal? @jerqi
I have left some comments.
If you have time, could u help review this proposal? @jerqi
I have left some comments.
updated the comments.
@zuston Do you commit the latest source? Some previous comments seems not be fixed, eg, user shouldn't be a member in RemoteStorage
message RemoteStorage {
string path = 1;
repeated RemoteStorageConfItem remoteStorageConf = 2;
string user = 3;
}
@zuston Do you commit the latest source? Some previous comments seems not be fixed, eg, user shouldn't be a member in RemoteStorage
message RemoteStorage { string path = 1; repeated RemoteStorageConfItem remoteStorageConf = 2; string user = 3; }
Sorry, i forget to submit, but i think i can submit new commit tomorrow.
I test this PR in our internal env.
- [x] coordinator read the exclude-node-file from kerberlized HDFS
- [x] shuffle server write shuffle data to kerbeilized HDFS and client read it directly.
Changelog
- Add the check of rss.security.hadoop.kerberos.relogin.interval.sec
- Add more test cases about kerberos HDFS cluster
@jerqi . Updated, and all CI test passed.
LGTM, I have no other suggestion. cc @colinmjj @duanmeng
Gentle ping @jerqi
Gentle ping @jerqi
Should you ping @colinmjj @duanmeng ?
@colinmjj @duanmeng If u have time, could you help review this?
Looking forward to your reply. Thanks.