Mooncake [mooncake-store]: prevent orphaned bucket data files from leaking dis…

Description

Add cleanup mechanism for bucket data files that are left orphaned when the process crashes between writing the data file and its metadata file.

Problem:

BucketStorageBackend writes data file first, then .meta file
If crash/failure occurs between these steps, data file is orphaned
Init() only scans .meta files, never discovers orphans
Orphaned files accumulate indefinitely, leaking disk space

Solution:

Add immediate cleanup in WriteBucket() when metadata write fails
Add orphan detection during Init() to clean up past crashes
Make cleanup opt-in (disabled by default) via constructor flag
Identify orphans by: numeric filename, no extension, no .meta pair

Changes:

BucketStorageBackend: Add enable_orphan_cleanup parameter (default: false)
WriteBucket(): Clean up data file if StoreBucketMetadata() fails
Init(): Scan for and remove orphaned data files when enabled
Add comprehensive test coverage for orphan cleanup

Issue link: https://github.com/kvcache-ai/Mooncake/issues/1139

Type of Change

Types
- [X] Bug fix
- [ ] New feature
  - [ ] Transfer Engine
  - [x] Mooncake Store
  - [ ] Mooncake EP
  - [ ] Integration
  - [ ] P2P Store
  - [ ] Python Wheel
- [ ] Breaking change
- [ ] CI/CD
- [ ] Documentation update
- [ ] Other

How Has This Been Tested?

OrphanedDataFileCleanup: Creates valid bucket, manually orphans files, verifies cleanup
Tests cleanup enabled vs disabled (default behavior)
Validates valid buckets preserved during orphan cleanup
Tests file count reduction after cleanup

Checklist

[x] I have performed a self-review of my own code.
[ ] I have updated the documentation.
[x] I have added tests to prove my changes are effective.

Nov 29 '25 22:11 maheshrbapatu

Summary of Changes

Hello @maheshrbapatu, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a critical fix to prevent disk space leakage in the mooncake-store by addressing the issue of orphaned bucket data files. Previously, if a system crash occurred between writing a data file and its metadata, the data file would remain on disk indefinitely. The changes implement both immediate cleanup during write failures and a proactive scan during initialization to detect and remove these orphaned files, improving the storage backend's robustness and efficiency.

Highlights

Orphaned File Prevention: Implemented a mechanism to prevent disk space leaks caused by orphaned bucket data files.
Crash Consistency: Addresses scenarios where a process crashes between writing a data file and its corresponding metadata file.
Immediate Cleanup: Adds logic to WriteBucket() to remove data files if their metadata write fails.
Proactive Cleanup: Introduces an orphan detection and cleanup process during Init() to handle files from past crashes.
Opt-in Feature: The orphan cleanup functionality is opt-in, controlled by a new enable_orphan_cleanup flag in the BucketStorageBackend constructor.
Identification Criteria: Orphans are identified by having a purely numeric filename, no extension, and no associated .meta file.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Nov 29 '25 22:11 gemini-code-assist[bot]

There is a format check in the CI. Try clang-format -i FILENAME to format the code.

Dec 01 '25 02:12 ykwd

The parameter enable_orphan_cleanup_ doesn't seem to be necessary.
In the case of metadata write failure, we can simply clean up the corresponding data file within the WriteBucket function.
Following the above logic, is it still necessary to use the .bucket suffix?

Dec 01 '25 06:12 zhuxinjie-nz

The parameter enable_orphan_cleanup_ doesn't seem to be necessary.

In the case of metadata write failure, we can simply clean up the corresponding data file within the WriteBucket function.

Following the above logic, is it still necessary to use the .bucket suffix?

According to the discussion on the slack thread, there are two possible scenarios:

If the data write succeeds but StoreBucketMetadata fails (e.g., disk full, I/O error) or if the process crashes specifically between these two steps, it seems we would be left with a fully written data file on disk but no corresponding .metafile.

The WriteBucket function itself cannot handle the second case.

Dec 01 '25 06:12 ykwd

The parameter enable_orphan_cleanup_ doesn't seem to be necessary.

In the case of metadata write failure, we can simply clean up the corresponding data file within the WriteBucket function.

Following the above logic, is it still necessary to use the .bucket suffix?

According to the discussion on the slack thread, there are two possible scenarios:

If the data write succeeds but StoreBucketMetadata fails (e.g., disk full, I/O error) or if the process crashes specifically between these two steps, it seems we would be left with a fully written data file on disk but no corresponding .metafile.

The WriteBucket function itself cannot handle the second case.

Should the second case be handled in the init function instead of the BatchOffload function?

Dec 01 '25 06:12 zhuxinjie-nz

The WriteBucket function itself cannot handle the second case.

Should the second case be handled in the init function instead of the BatchOffload function?

I believe this PR does handle that case in the init function — unless I’m misunderstanding something. Let me know if I’ve overlooked anything.

Dec 01 '25 06:12 ykwd

The WriteBucket function itself cannot handle the second case.

Should the second case be handled in the init function instead of the BatchOffload function?

I believe this PR does handle that case in the init function — unless I’m misunderstanding something. Let me know if I’ve overlooked anything.

Sorry, I made a mistake — there's only one question left: is it necessary to keep the enable_orphan_cleanup_ parameter?

Dec 01 '25 07:12 zhuxinjie-nz

The WriteBucket function itself cannot handle the second case.

Should the second case be handled in the init function instead of the BatchOffload function?

I believe this PR does handle that case in the init function — unless I’m misunderstanding something. Let me know if I’ve overlooked anything.

Sorry, I made a mistake — there's only one question left: is it necessary to keep the enable_orphan_cleanup_ parameter?

Hello Yes we can remove enable_orphan_cleanup_ parameter which is not necessary at this point, I have added it initially as I was unclear about bucket files as they didnt have .bucket extension.

Should the second case be handled in the init function instead of the BatchOffload function? As @ykwd already pointed we are cleaning up it in init.

In the case of metadata write failure, we can simply clean up the corresponding data file within the WriteBucket function Yes we can do this and we are already doing this as @ykwd mentioned. But if the node crashes after writing data but before writing metadata then we still need to clean it up during init.

Documenting these just for the reference of the community as most of the questions were already answered.

Dec 01 '25 13:12 maheshrbapatu

Mooncake Mooncake copied to clipboard

[mooncake-store]: prevent orphaned bucket data files from leaking dis…

Description

Type of Change

How Has This Been Tested?

Checklist

Summary of Changes

Highlights

Mooncake
Mooncake copied to clipboard