doris icon indicating copy to clipboard operation
doris copied to clipboard

[Enhancement](spark load)Support for RM HA

Open liujinhui1994 opened this issue 2 years ago • 4 comments

Proposed changes

Issue Number: close #13806

Problem summary

Adding RM HA configuration to the spark load. Spark can accept HA parameters via config, we just need to accept it in the DDL

CREATE EXTERNAL RESOURCE spark_resource_sinan_node_manager_ha PROPERTIES ( "type" = "spark", "spark.master" = "yarn", "spark.submit.deployMode" = "cluster", "spark.executor.memory" = "10g", "spark.yarn.queue" = "XXXX", "spark.hadoop.yarn.resourcemanager.address" = "XXXX:8032", "spark.hadoop.yarn.resourcemanager.ha.enabled" = "true", "spark.hadoop.yarn.resourcemanager.ha.rm-ids" = "rm1,rm2", "spark.hadoop.yarn.resourcemanager.hostname.rm1" = "XXXX", "spark.hadoop.yarn.resourcemanager.hostname.rm2" = "XXXX", "spark.hadoop.fs.defaultFS" = "hdfs://XXXX", "spark.hadoop.dfs.nameservices" = "hacluster", "spark.hadoop.dfs.ha.namenodes.hacluster" = "mynamenode1,mynamenode2", "spark.hadoop.dfs.namenode.rpc-address.hacluster.mynamenode1" = "XXX:8020", "spark.hadoop.dfs.namenode.rpc-address.hacluster.mynamenode2" = "XXXX:8020", "spark.hadoop.dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", "working_dir" = "hdfs://XXXX/doris_prd_data/sinan/spark_load/", "broker" = "broker_personas", "broker.username" = "hdfs", "broker.password" = "", "broker.dfs.nameservices" = "XXX", "broker.dfs.ha.namenodes.XXX" = "mynamenode1, mynamenode2", "broker.dfs.namenode.rpc-address.XXXX.mynamenode1" = "XXXX:8020", "broker.dfs.namenode.rpc-address.XXXX.mynamenode2" = "XXXX:8020", "broker.dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" );

Checklist(Required)

  1. Does it affect the original behavior:
    • [ ] Yes
    • [x] No
    • [ ] I don't know
  2. Has unit tests been added:
    • [x] Yes
    • [ ] No
    • [ ] No Need
  3. Has document been added or modified:
    • [ ] Yes
    • [x] No
    • [ ] No Need
  4. Does it need to update dependencies:
    • [ ] Yes
    • [x] No
  5. Are there any changes that cannot be rolled back:
    • [ ] Yes (If Yes, please explain WHY)
    • [x] No

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

liujinhui1994 avatar Dec 12 '22 03:12 liujinhui1994

TeamCity pipeline, clickbench performance test result: the sum of best hot time: 34.93 seconds load time: 457 seconds storage size: 17123356343 Bytes https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221212035407_clickbench_pr_61757.html

hello-stephen avatar Dec 12 '22 03:12 hello-stephen

这个功能什么时候能合并到新的代码分支里面呢?

servlet001 avatar Feb 20 '23 07:02 servlet001

这个功能什么时候能合并到新的代码分支里面呢?

这个需要验证一下,我尽快 进行验证,我这边暂时没有空余的环境可以支撑。可能耽误了一些时间~

liujinhui1994 avatar Feb 24 '23 06:02 liujinhui1994

run buildall

morningman avatar Mar 03 '23 09:03 morningman