doris
doris copied to clipboard
[Enhancement](spark load)Support for RM HA
Proposed changes
Issue Number: close #13806
Problem summary
Adding RM HA configuration to the spark load. Spark can accept HA parameters via config, we just need to accept it in the DDL
CREATE EXTERNAL RESOURCE spark_resource_sinan_node_manager_ha PROPERTIES ( "type" = "spark", "spark.master" = "yarn", "spark.submit.deployMode" = "cluster", "spark.executor.memory" = "10g", "spark.yarn.queue" = "XXXX", "spark.hadoop.yarn.resourcemanager.address" = "XXXX:8032", "spark.hadoop.yarn.resourcemanager.ha.enabled" = "true", "spark.hadoop.yarn.resourcemanager.ha.rm-ids" = "rm1,rm2", "spark.hadoop.yarn.resourcemanager.hostname.rm1" = "XXXX", "spark.hadoop.yarn.resourcemanager.hostname.rm2" = "XXXX", "spark.hadoop.fs.defaultFS" = "hdfs://XXXX", "spark.hadoop.dfs.nameservices" = "hacluster", "spark.hadoop.dfs.ha.namenodes.hacluster" = "mynamenode1,mynamenode2", "spark.hadoop.dfs.namenode.rpc-address.hacluster.mynamenode1" = "XXX:8020", "spark.hadoop.dfs.namenode.rpc-address.hacluster.mynamenode2" = "XXXX:8020", "spark.hadoop.dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", "working_dir" = "hdfs://XXXX/doris_prd_data/sinan/spark_load/", "broker" = "broker_personas", "broker.username" = "hdfs", "broker.password" = "", "broker.dfs.nameservices" = "XXX", "broker.dfs.ha.namenodes.XXX" = "mynamenode1, mynamenode2", "broker.dfs.namenode.rpc-address.XXXX.mynamenode1" = "XXXX:8020", "broker.dfs.namenode.rpc-address.XXXX.mynamenode2" = "XXXX:8020", "broker.dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" );
Checklist(Required)
- Does it affect the original behavior:
- [ ] Yes
- [x] No
- [ ] I don't know
- Has unit tests been added:
- [x] Yes
- [ ] No
- [ ] No Need
- Has document been added or modified:
- [ ] Yes
- [x] No
- [ ] No Need
- Does it need to update dependencies:
- [ ] Yes
- [x] No
- Are there any changes that cannot be rolled back:
- [ ] Yes (If Yes, please explain WHY)
- [x] No
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...
TeamCity pipeline, clickbench performance test result: the sum of best hot time: 34.93 seconds load time: 457 seconds storage size: 17123356343 Bytes https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221212035407_clickbench_pr_61757.html
这个功能什么时候能合并到新的代码分支里面呢?
这个功能什么时候能合并到新的代码分支里面呢?
这个需要验证一下,我尽快 进行验证,我这边暂时没有空余的环境可以支撑。可能耽误了一些时间~
run buildall