What changes were proposed in this pull request?

Support storing shuffle data to secured HDFS cluster by spark job user's own permission in shuffle server side.

Why are the changes needed?

When using the storage type of MEMEORY_LOCALFILE_HDFS, we meet some problems on flushing shuffle data to secured HDFS clusters due to lacking credentials. To solve this, keytabs need to be distributed to all shuffle servers and login/refresh by crontab or other ways.

But this way is not secured, it’s better to flush data with corresponding HDFS users for data isolation.

We hope that:

user A launched a Spark application, and send shuffle data to shuffle server.
Shuffle server should write shuffle data with "user A" into HDFS.
Reduce tasks (launched by user A) could read shuffle data in HDFS with user
Otherwise, user A may not have the permission to read shuffle data written by shuffle server (another user) if it it is not owned by A.

More detail and motivation can be found in design doc: https://docs.google.com/document/d/1pIDwCwv8iwnXmFQeTZKA5tK55SRc_f0SPKYFONA5K70

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Manual tests and unit tests.

Jul 12 '22 10:07 zuston

@zuston To access the secured DFS cluster:

For spark client, it can depend on spark's implementation, and read data according to delegation token.
For Shuffle server, Hadoop conf can be updated with security enable when write data to HDFS What's the advantage to add HadoopAccessorProvider?

Jul 13 '22 02:07 colinmjj

@colinmjj @jerqi

For spark client, it can depend on spark's implementation, and read data according to delegation token.

yes. There is no need to retrieve fs by ugi proxy user. The credentials have been fetched in spark driver side when starting.

For Shuffle server, Hadoop conf can be updated with security enable when write data to HDFS What's the advantage to add HadoopAccessorProvider?

Now shuffle server can’t access secured cluster due to lacking keytab.

To solve this, i introduced some config and then shuffle server can login and delegate user to write hdfs files by using the proxy user.

Jul 13 '22 04:07 zuston

@colinmjj @jerqi

For spark client, it can depend on spark's implementation, and read data according to delegation token.

yes. There is no need to retrieve fs by ugi proxy user. The credentials have been fetched in spark driver side when starting.

For Shuffle server, Hadoop conf can be updated with security enable when write data to HDFS What's the advantage to add HadoopAccessorProvider?

Now shuffle server can’t access secured cluster due to lacking keytab.

To solve this, i introduced some config and then shuffle server can login and delegate user to write hdfs files by using the proxy user.

Have you try the following way to update Hadoop configuration:

Option 1, in Shuffle server, conf prefix with rss.server.hadoop will be added to Hadoop configuration for data writing.
Option 2, Coordinator can manage the information for different Hdfs, you can check the related API rpc fetchRemoteStorage(FetchRemoteStorageRequest) for more detail.

For the keytab problem, do you mean it can't be existed in Shuffle Server?

Jul 13 '22 06:07 colinmjj

We may not need to support secured hdfs in Uniffle, the DFSClient would create ugi and spawn a renew thread automatically when the shuffler server gets a FileSystem. Normally you could use the hdfs cli to read/write files in secured hdfs if the kerberos-related config is right in the HADOOP_HOME. Could you check the code in DFSClient and UserGroupInformation and test without any code change using a secured hdfs. @zuston

Jul 13 '22 07:07 duanmeng

I think what @zuston mentioned here is the case, for example:

user A launched a Spark application, and send shuffle data to shuffle server.
Shuffle server should write shuffle data with "user A" into HDFS.
Reduce tasks (launched by user A) could read shuffle data in HDFS with user A.

Otherwise, user A may not have the permission to read shuffle data written by shuffle server (another user) if it it is not owned by A.

I'm not sure if I understand correctly or not? @zuston

Jul 13 '22 07:07 jerryshao

@jerryshao You have caught my thoughts and thanks very much for explaining it in detail. Sorry I didn't explain clearly in advance.

So I think using the super user to delegate proxy user will be better in access control.

Jul 13 '22 07:07 zuston

@jerryshao You have caught my thoughts and thanks very much for explaining it in detail. Sorry I didn't explain clearly in advance.

So I think using the super user to delegate proxy user will be better in access control.

Got you points, I think the feature is necessary to support proxy user when using secured HDFS

Jul 13 '22 08:07 colinmjj

Now our Uniffle already support multiple HDFS. Is it possible to use the secured hdfs cluster and unsecured hdfds cluster? And some configuration may be added to RemoteStorageConfItem.https://github.com/apache/incubator-uniffle/blob/0f6a896efbd3a435de5e7a5d28843ecd05c38bde/proto/src/main/proto/Rss.proto#L360

Jul 13 '22 08:07 jerqi

Now our Uniffle already support multiple HDFS. Is it possible to use the secured hdfs cluster and unsecured hdfds cluster? And some configuration may be added to RemoteStorageConfItem.

https://github.com/apache/incubator-uniffle/blob/0f6a896efbd3a435de5e7a5d28843ecd05c38bde/proto/src/main/proto/Rss.proto#L360

I think the logged in ugi can get the unsecured hdfs filesystem. But this case is not involved in current test case, maybe added it later.

Jul 13 '22 10:07 zuston

Codecov Report

Merging #53 (3a01bf3) into master (bc621c3) will increase coverage by 0.35%. The diff coverage is 72.58%.

@@             Coverage Diff              @@
##             master      #53      +/-   ##
============================================
+ Coverage     58.00%   58.36%   +0.35%     
- Complexity     1231     1260      +29     
============================================
  Files           152      157       +5     
  Lines          8237     8384     +147     
  Branches        771      779       +8     
============================================
+ Hits           4778     4893     +115     
- Misses         3215     3240      +25     
- Partials        244      251       +7

Impacted Files	Coverage Δ
...he/uniffle/client/impl/ShuffleWriteClientImpl.java	`25.37% <0.00%> (-0.59%)`	:arrow_down:
...g/apache/uniffle/coordinator/CoordinatorUtils.java	`65.21% <ø> (+4.10%)`	:arrow_up:
...pache/uniffle/server/ShuffleServerGrpcService.java	`1.00% <0.00%> (-0.01%)`	:arrow_down:
...he/uniffle/server/buffer/ShuffleBufferManager.java	`82.25% <ø> (ø)`
...org/apache/uniffle/storage/common/HdfsStorage.java	`0.00% <0.00%> (ø)`
...storage/handler/impl/HdfsShuffleDeleteHandler.java	`0.00% <0.00%> (ø)`
...e/storage/handler/impl/HdfsShuffleReadHandler.java	`51.02% <ø> (ø)`
...rage/request/CreateShuffleWriteHandlerRequest.java	`0.00% <0.00%> (ø)`
...ache/uniffle/storage/util/ShuffleStorageUtils.java	`64.13% <ø> (+3.13%)`	:arrow_up:
.../apache/uniffle/coordinator/CoordinatorServer.java	`65.59% <33.33%> (-3.46%)`	:arrow_down:
... and 21 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

Jul 16 '22 06:07 codecov-commenter

Updated. @jerqi.

Besides, i think i will introduce some ITs later.

Jul 16 '22 15:07 zuston

What changes were proposed in this pull request?

Support storing shuffle data to secured dfs cluster

Why are the changes needed?

Now uniffle dont support visiting shuffle data to secured dfs cluster.

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Manual tests.

Should we provide more detailed description?

Jul 17 '22 14:07 jerqi

@zuston To access secured DFS, I think the process should be:

get remote storage from coordinator
register remote storage to shuffle server with user info
shuffle server stores the mapping about appId -> user
when shuffle server write DFS with write handler, get user info and write data with proxy mode by doAs() what do you think?

Jul 18 '22 03:07 colinmjj

Access to secured clusters and isolation of user shuffle data are orthogonal, you can still do the isolation in a non-securere cluster (using the proxy user way in your pr). Could you please help to write the pr's description in more detail, and write a design doc to describe your motivation, design, workflow, compatibility, and test case in different scenarios since it's a huge and critical change. @zuston

Jul 18 '22 03:07 duanmeng

@zuston To access secured DFS, I think the process should be:

get remote storage from coordinator

register remote storage to shuffle server with user info

shuffle server stores the mapping about appId -> user

when shuffle server write DFS with write handler, get user info and write data with proxy mode by doAs() what do you think?

Got your points.

As I know the HdfsStorage is the shareable in the entire JVM. So now the user only will be added in CreateShuffleWriteHandlerRequest, which is bound to the app's w/r operations. I think the above changes have been reflected in the latest commit.

Jul 18 '22 13:07 zuston

Access to secured clusters and isolation of user shuffle data are orthogonal, you can still do the isolation in a non-securere cluster (using the proxy user way in your pr).

It could use the proxy user to make the data written to unsecured dfs cluster, to make data isolation? Please let me know whether i am right.

Could you please help to write the pr's description in more detail, and write a design doc to describe your motivation, design, workflow, compatibility, and test case in different scenarios since it's a huge and critical change.

Do i need to introduce a new page to submit proposal like RFC? Or just write it in description? Or open a new issue to track and discuss.

@duanmeng

Jul 18 '22 13:07 zuston

It could use the proxy user to make the data written to unsecured dfs cluster, to make data isolation? Please let me know whether i am right.

Yes, you're right, say user B can not access the path created by user A, they are both proxy user of the user running the uniflle cluster.

Do i need to introduce a new page to submit proposal like RFC? Or just write it in description? Or open a new issue to track and discuss.

I suggest you enrich your description and add your design doc's (google doc or tencent doc) link to it.

Jul 19 '22 02:07 duanmeng

Updated the description and submit a simple proposal (https://docs.google.com/document/d/1pIDwCwv8iwnXmFQeTZKA5tK55SRc_f0SPKYFONA5K70). Feel free to discuss or comment more.

@jerqi @jerryshao @duanmeng @colinmjj

Besides if the uniffle community has a public proposal confluence wiki, it will be better, like flink: https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

Jul 24 '22 13:07 zuston

If you have time, could u help review this proposal? @jerqi

Jul 26 '22 11:07 zuston

If you have time, could u help review this proposal? @jerqi

I have left some comments.

Jul 27 '22 07:07 jerqi

If you have time, could u help review this proposal? @jerqi

I have left some comments.

updated the comments.

Jul 31 '22 14:07 jerqi

@zuston Do you commit the latest source? Some previous comments seems not be fixed, eg, user shouldn't be a member in RemoteStorage

message RemoteStorage {
  string path = 1;
  repeated RemoteStorageConfItem remoteStorageConf = 2;
  string user = 3;
}

Aug 01 '22 11:08 colinmjj

@zuston Do you commit the latest source? Some previous comments seems not be fixed, eg, user shouldn't be a member in RemoteStorage
message RemoteStorage {
  string path = 1;
  repeated RemoteStorageConfItem remoteStorageConf = 2;
  string user = 3;
}

Sorry, i forget to submit, but i think i can submit new commit tomorrow.

Aug 01 '22 11:08 zuston

I test this PR in our internal env.

[x] coordinator read the exclude-node-file from kerberlized HDFS
[x] shuffle server write shuffle data to kerbeilized HDFS and client read it directly.

Aug 03 '22 07:08 zuston

Changelog

Add the check of rss.security.hadoop.kerberos.relogin.interval.sec
Add more test cases about kerberos HDFS cluster

Aug 05 '22 03:08 zuston

@jerqi . Updated, and all CI test passed.

Aug 05 '22 06:08 zuston

LGTM, I have no other suggestion. cc @colinmjj @duanmeng

Aug 05 '22 09:08 jerqi

Gentle ping @jerqi

Aug 08 '22 10:08 zuston

Gentle ping @jerqi

Should you ping @colinmjj @duanmeng ?

Aug 08 '22 11:08 jerqi

@colinmjj @duanmeng If u have time, could you help review this?

Looking forward to your reply. Thanks.

Aug 09 '22 06:08 zuston

incubator-uniffle incubator-uniffle copied to clipboard

Support storing shuffle data to secured dfs cluster

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Codecov Report

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

incubator-uniffle
incubator-uniffle copied to clipboard