starrocks
starrocks copied to clipboard
[BugFix] The hdfs directory is not synchronized when the spark resource is deleted
Steps to reproduce the behavior (Required)
-
- create spark load
LOAD LABEL pre_stream.test_load_ly_2 (
DATA FROM TABLE test_list_dup_sr_external_h2s_foit_820240510
INTO TABLE test_list_dup_sr
TEMPORARY PARTITION(temp__p20230930_BR)
SET (
`id` = `id`,
`name` = `name`,
`dt` = '2023-09-30',
`country_code` = 'BR'
)
)WITH RESOURCE 'spark_resource' (
"spark.yarn.tags" = "xxx05131",
"spark.dynamicAllocation.enabled" = "true",
"spark.executor.memory" = "3g",
"spark.executor.memoryOverhead" = "2g",
"spark.streaming.batchDuration" = "5",
"spark.executor.cores" = "1",
"spark.yarn.executor.memoryOverhead" = "2g",
"spark.speculation" = "false",
"spark.dynamicAllocation.minExecutors" = "2",
"spark.dynamicAllocation.maxExecutors" = "100"
) PROPERTIES (
"timeout" = "72000",
"spark_load_submit_timeout" = "7200"
)
;
-
- some directory is created
2024-05-14 01:42:12,013 INFO (pending_load_task_scheduler_pool-1|498) [SparkRepository.upload():302] finished to upload file, localPath=/home/hadoop/starrocks-current/fe/spark-dpp/spark-dpp-1.0.0-jar-with-dependencies.jar, remotePath=hdfs://ClusterNmg/user/prod_xxx/sparketl/1384206915/__spark_repository__db__tb_sr__1019adb1d38c/__archive_1.0.0/__lib__spark-dpp-1.0.0-jar-with-dependencies.jar
2024-05-14 01:42:12,077 INFO (pending_load_task_scheduler_pool-1|498) [SparkRepository.rename():316] finished to rename file, originPath=hdfs://ClusterNmg/user/prod_xxx/sparketl/1384206915/__spark_repository__db__tb_sr__1019adb1d38c/__archive_1.0.0/__lib__spark-dpp-1.0.0-jar-with-dependencies.jar, destPath=hdfs://ClusterNmg/user/prod_xxx/sparketl/1384206915/__spark_repository__db__tb_sr__1019adb1d38c/__archive_1.0.0/__lib_70688c469808112f344091125a860404_spark-dpp-1.0.0-jar-with-dependencies.jar
-
- drop spark resource
drop resource spark_resource
-
- The hdfs directory is not synchronized when the spark resource is deleted
[hadoop@bigdata-starrocks-xxx ~]$ hdfs dfs -ls hdfs://ClusterNmg/user/prod_xxx/sparketl/1384206915/__spark_repository__spark_resource/__archive_1.0.0/
Found 2 items
-rw-r--r-- 3 prod_xxx supergroup 394653421 2024-05-20 10:54 hdfs://ClusterNmg/user/prod_xxx/sparketl/1384206915/__spark_repository__spark_resource/__archive_1.0.0/__lib_62eff19a2751990e17b47aa258fb7623_spark-2x.zip
-rw-r--r-- 3 prod_xxx supergroup 4013682 2024-05-20 10:53 hdfs://ClusterNmg/user/prod_xxx/sparketl/1384206915/__spark_repository__spark_resource/__archive_1.0.0/__lib_70688c469808112f344091125a860404_spark-dpp-1.0.0-jar-with-dependencies.jar
Expected behavior (Required)
drop spark resource and delete spark directory
Real behavior (Required)
drop spark resource and the spark directory didn't remove
StarRocks version (Required)
- You can get the StarRocks version by executing SQL
select current_version()