Mooncake icon indicating copy to clipboard operation
Mooncake copied to clipboard

[mooncake-store]: prevent orphaned bucket data files from leaking dis…

Open maheshrbapatu opened this issue 4 days ago • 8 comments

Description

Add cleanup mechanism for bucket data files that are left orphaned when the process crashes between writing the data file and its metadata file.

Problem:

  • BucketStorageBackend writes data file first, then .meta file
  • If crash/failure occurs between these steps, data file is orphaned
  • Init() only scans .meta files, never discovers orphans
  • Orphaned files accumulate indefinitely, leaking disk space

Solution:

  • Add immediate cleanup in WriteBucket() when metadata write fails
  • Add orphan detection during Init() to clean up past crashes
  • Make cleanup opt-in (disabled by default) via constructor flag
  • Identify orphans by: numeric filename, no extension, no .meta pair

Changes:

  • BucketStorageBackend: Add enable_orphan_cleanup parameter (default: false)
  • WriteBucket(): Clean up data file if StoreBucketMetadata() fails
  • Init(): Scan for and remove orphaned data files when enabled
  • Add comprehensive test coverage for orphan cleanup

Issue link: https://github.com/kvcache-ai/Mooncake/issues/1139

Type of Change

  • Types
    • [X] Bug fix
    • [ ] New feature
      • [ ] Transfer Engine
      • [x] Mooncake Store
      • [ ] Mooncake EP
      • [ ] Integration
      • [ ] P2P Store
      • [ ] Python Wheel
    • [ ] Breaking change
    • [ ] CI/CD
    • [ ] Documentation update
    • [ ] Other

How Has This Been Tested?

  • OrphanedDataFileCleanup: Creates valid bucket, manually orphans files, verifies cleanup
  • Tests cleanup enabled vs disabled (default behavior)
  • Validates valid buckets preserved during orphan cleanup
  • Tests file count reduction after cleanup

Checklist

  • [x] I have performed a self-review of my own code.
  • [ ] I have updated the documentation.
  • [x] I have added tests to prove my changes are effective.

maheshrbapatu avatar Nov 29 '25 22:11 maheshrbapatu

Summary of Changes

Hello @maheshrbapatu, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a critical fix to prevent disk space leakage in the mooncake-store by addressing the issue of orphaned bucket data files. Previously, if a system crash occurred between writing a data file and its metadata, the data file would remain on disk indefinitely. The changes implement both immediate cleanup during write failures and a proactive scan during initialization to detect and remove these orphaned files, improving the storage backend's robustness and efficiency.

Highlights

  • Orphaned File Prevention: Implemented a mechanism to prevent disk space leaks caused by orphaned bucket data files.
  • Crash Consistency: Addresses scenarios where a process crashes between writing a data file and its corresponding metadata file.
  • Immediate Cleanup: Adds logic to WriteBucket() to remove data files if their metadata write fails.
  • Proactive Cleanup: Introduces an orphan detection and cleanup process during Init() to handle files from past crashes.
  • Opt-in Feature: The orphan cleanup functionality is opt-in, controlled by a new enable_orphan_cleanup flag in the BucketStorageBackend constructor.
  • Identification Criteria: Orphans are identified by having a purely numeric filename, no extension, and no associated .meta file.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

gemini-code-assist[bot] avatar Nov 29 '25 22:11 gemini-code-assist[bot]

There is a format check in the CI. Try clang-format -i FILENAME to format the code.

ykwd avatar Dec 01 '25 02:12 ykwd

  1. The parameter enable_orphan_cleanup_ doesn't seem to be necessary.

  2. In the case of metadata write failure, we can simply clean up the corresponding data file within the WriteBucket function.

  3. Following the above logic, is it still necessary to use the .bucket suffix?

zhuxinjie-nz avatar Dec 01 '25 06:12 zhuxinjie-nz

  1. The parameter enable_orphan_cleanup_ doesn't seem to be necessary.
  2. In the case of metadata write failure, we can simply clean up the corresponding data file within the WriteBucket function.
  3. Following the above logic, is it still necessary to use the .bucket suffix?

According to the discussion on the slack thread, there are two possible scenarios:

If the data write succeeds but StoreBucketMetadata fails (e.g., disk full, I/O error) or if the process crashes specifically between these two steps, it seems we would be left with a fully written data file on disk but no corresponding .metafile.

The WriteBucket function itself cannot handle the second case.

ykwd avatar Dec 01 '25 06:12 ykwd

  1. The parameter enable_orphan_cleanup_ doesn't seem to be necessary.
  2. In the case of metadata write failure, we can simply clean up the corresponding data file within the WriteBucket function.
  3. Following the above logic, is it still necessary to use the .bucket suffix?

According to the discussion on the slack thread, there are two possible scenarios:

If the data write succeeds but StoreBucketMetadata fails (e.g., disk full, I/O error) or if the process crashes specifically between these two steps, it seems we would be left with a fully written data file on disk but no corresponding .metafile.

The WriteBucket function itself cannot handle the second case.

Should the second case be handled in the init function instead of the BatchOffload function?

zhuxinjie-nz avatar Dec 01 '25 06:12 zhuxinjie-nz

The WriteBucket function itself cannot handle the second case.

Should the second case be handled in the init function instead of the BatchOffload function?

I believe this PR does handle that case in the init function — unless I’m misunderstanding something. Let me know if I’ve overlooked anything.

ykwd avatar Dec 01 '25 06:12 ykwd

The WriteBucket function itself cannot handle the second case.

Should the second case be handled in the init function instead of the BatchOffload function?

I believe this PR does handle that case in the init function — unless I’m misunderstanding something. Let me know if I’ve overlooked anything.

Sorry, I made a mistake — there's only one question left: is it necessary to keep the enable_orphan_cleanup_ parameter?

zhuxinjie-nz avatar Dec 01 '25 07:12 zhuxinjie-nz

The WriteBucket function itself cannot handle the second case.

Should the second case be handled in the init function instead of the BatchOffload function?

I believe this PR does handle that case in the init function — unless I’m misunderstanding something. Let me know if I’ve overlooked anything.

Sorry, I made a mistake — there's only one question left: is it necessary to keep the enable_orphan_cleanup_ parameter?

Hello Yes we can remove enable_orphan_cleanup_ parameter which is not necessary at this point, I have added it initially as I was unclear about bucket files as they didnt have .bucket extension.

Should the second case be handled in the init function instead of the BatchOffload function? As @ykwd already pointed we are cleaning up it in init.

In the case of metadata write failure, we can simply clean up the corresponding data file within the WriteBucket function Yes we can do this and we are already doing this as @ykwd mentioned. But if the node crashes after writing data but before writing metadata then we still need to clean it up during init.

Documenting these just for the reference of the community as most of the questions were already answered.

maheshrbapatu avatar Dec 01 '25 13:12 maheshrbapatu