hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-28051: LLAP: cleanup local folders on startup and periodically

Open abstractdog opened this issue 1 year ago • 4 comments

What changes were proposed in this pull request?

Implement a LocalDirCleaner which can remove old files from LLAP local dirs.

Why are the changes needed?

Because LLAP cannot take care of cleaning up old files in every failure scenario, this was shown in a customer problem. When I investigated HIVE-24272 I found that local files for failed queries/DAGs are cleaned up, so this is clearly about leftovers after daemon crashes.

Does this PR introduce any user-facing change?

No.

Is the change a dependency upgrade?

No.

How was this patch tested?

Unit test added, tested on llap daemon, added a file as:

mkdir -p  /apps/llap/work/usercache/hive/appcache/application_1707917402901_0001/3/output
touch  /apps/llap/work/usercache/hive/appcache/application_1707917402901_0001/3/output/file.out

set quick intervals like:

hive.llap.local.dir.cleaner.file.modify.time.threshold=60s
hive.llap.local.dir.cleaner.cleanup.interval=30s

checked logs:

query-executor <14>1 2024-02-15T07:37:59.351Z query-executor-0-0 query-executor 1 79184f83-fdb0-4982-9bd9-4e297438f8be [mdc@18060 class="impl.LocalDirCleaner" level="INFO" thread="pool-14-thread-1"] Cleaning up files older than 2024-02-15T07:36:59.351485Z from /apps/llap/work
query-executor <14>1 2024-02-15T07:38:29.351Z query-executor-0-0 query-executor 1 79184f83-fdb0-4982-9bd9-4e297438f8be [mdc@18060 class="impl.LocalDirCleaner" level="INFO" thread="pool-14-thread-1"] Cleaning up files older than 2024-02-15T07:37:29.351488Z from /apps/llap/work
query-executor <14>1 2024-02-15T07:38:59.351Z query-executor-0-0 query-executor 1 79184f83-fdb0-4982-9bd9-4e297438f8be [mdc@18060 class="impl.LocalDirCleaner" level="INFO" thread="pool-14-thread-1"] Cleaning up files older than 2024-02-15T07:37:59.351487Z from /apps/llap/work
query-executor <14>1 2024-02-15T07:38:59.352Z query-executor-0-0 query-executor 1 79184f83-fdb0-4982-9bd9-4e297438f8be [mdc@18060 class="impl.LocalDirCleaner" level="INFO" thread="pool-14-thread-1"] Delete old file: /apps/llap/work/usercache/hive/appcache/application_1707917402901_0001/3/output/file.out
query-executor <14>1 2024-02-15T07:39:29.351Z query-executor-0-0 query-executor 1 79184f83-fdb0-4982-9bd9-4e297438f8be [mdc@18060 class="impl.LocalDirCleaner" level="INFO" thread="pool-14-thread-1"] Cleaning up files older than 2024-02-15T07:38:29.351486Z from /apps/llap/work
query-executor <14>1 2024-02-15T07:39:59.351Z query-executor-0-0 query-executor 1 79184f83-fdb0-4982-9bd9-4e297438f8be [mdc@18060 class="impl.LocalDirCleaner" level="INFO" thread="pool-14-thread-1"] Cleaning up files older than 2024-02-15T07:38:59.351485Z from /apps/llap/work

abstractdog avatar Jan 31 '24 15:01 abstractdog

I think you missed updating the commit message with Hive jira

simhadri-g avatar Jan 31 '24 15:01 simhadri-g

I think you missed updating the commit message with Hive jira

LOL, thanks, fixed

abstractdog avatar Jan 31 '24 15:01 abstractdog

unrelated test failure, I can rerun eventually, this can be reviewed

abstractdog avatar Feb 01 '24 06:02 abstractdog

Quality Gate Passed Quality Gate passed

Issues
120 New issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

sonarqubecloud[bot] avatar Feb 22 '24 10:02 sonarqubecloud[bot]

@zhangbutao : thanks for your comments, according to the last comment, I'm considering this as approved, let me know if it's otherwise

abstractdog avatar Mar 08 '24 07:03 abstractdog

@zhangbutao : thanks for your comments, according to the last comment, I'm considering this as approved, let me know if it's otherwise

Yes, free to merge this change. :)

zhangbutao avatar Mar 08 '24 07:03 zhangbutao