incubator-uniffle
incubator-uniffle copied to clipboard
[#1303] refactor(client): using the configOption in RssTezConfig
What changes were proposed in this pull request?
Improvement: using the configOption in RssTezConfig #1303
Why are the changes needed?
Fix: https://github.com/apache/incubator-uniffle/issues/1303
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test
Codecov Report
Attention: 79 lines
in your changes are missing coverage. Please review.
Comparison is base (
15a399a
) 53.43% compared to head (79a08b2
) 54.88%. Report is 4 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #1413 +/- ##
============================================
+ Coverage 53.43% 54.88% +1.45%
- Complexity 2723 2737 +14
============================================
Files 419 402 -17
Lines 23992 22015 -1977
Branches 2046 2063 +17
============================================
- Hits 12820 12083 -737
+ Misses 10378 9201 -1177
+ Partials 794 731 -63
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I like this PR that refactor the client config. Could you help fix the failed test cases
Could you help review this? @zhengchenyu
I like this PR that refactor the client config. Could you help fix the failed test cases
Ok, I will fix the failed test cases recently.
ping @zhengchenyu
It is a great PR! The code about config will be cleaner! But I've always had a question. There are a lot of transitions from Configuration to RssConf, then from RssConf to Configuration. It seems strange. In this PR, I think @lifeSo used cache Configuration in TezClientConf, it is more efficient, but additional memory is required. @zuston What do you think of this approach?
In this PR, I think @lifeSo used cache Configuration in TezClientConf, it is more efficient, but additional memory is required.
LGTM. It looks the memory will not consume much more. If I'm wrong, feel free to correct.
I summarized the usage of TezClientConf, and why need the transitions:
-
GetShuffleServerRequest & GetShuffleServerResponse & IdUtils & InputContextUtils no nedd conf
-
RssDAGAppMaster need TezClientConf to get arguement and transitions.
-
TezRemoteShuffleManager need TezClientConf and configuration ,to get arguement and transitions.
-
RssTezUtils need RssTezUtils
-
ShuffleAssignmentsInfoWritable no need conf
-
TezClassLoader no need conf
-
UmbilicalUtils need configuration to pass arguement
-
RssTezAMPolicyProvider no conf
-
RemoteFetchedInput no conf
-
RssShuffleManager no need TezClientConf to get conf but transitions is needed.
-
RssSimpleFetchedInputAllocator need configuration and TezClientConf to get some conf
-
RssTezFetcher no need configuration and TezClientConf
-
RssTezFetcherTask need TezClientConf to get argue,also need Configuration to get transitions.
-
RssInMemoryMerger & RssMergeManager & RssShuffle & RssShuffleScheduler need configuration to call other Tez class, no need TezClientConf
-
RssTezBypassWriter & RssTezShuffleDataFetcher & WriteBuffer
-
WriteBufferManager no need , just pass RssConf.
-
RssSorter & RssUnSorter need Configuration and use only in super()
-
RssTezPerPartitionRecord no conf
-
RssOrderedPartitionedKVOutput & RssUnorderedKVOutput & RssUnorderedPartitionedKVOutput no nedd TezClientConf.
cc @zhengchenyu I'm not familiar with Tez code.
cc @zhengchenyu I'm not familiar with Tez code.
This PR does not involve the core logic of Tez. This PR is only about how the MR/Tez/Spark client uses configuration. I'm still a little confused about how to design the client config.
Emm... I take a look about current impl, I don't think we should take the tez prefix into the config key, you can refer the spark's RssClientConf .
In spark's partial configOption, we just extract the spark's conf like spark.rss.client.type into RssConf, that means we could share some general configOption defined in the RssClientConf. So you just only create the TezClientConfig.java, which could define some tez specified configOptions, shareable configs could use defined in RssClientConf. cc @zhengchenyu
@zuston I agree with you! If so, RssClientConf should be extended from RssClientConf (But for now, RssClientConf is extended from RssBaseConf ), and does not take the tez prefix into the config key. I think the tez prefix should be considered in the TezConfiguration. @lifeSo what about you?