bookkeeper icon indicating copy to clipboard operation
bookkeeper copied to clipboard

refactor: Eliminate redundant cleanup logic in entry log extraction

Open nodece opened this issue 8 months ago • 5 comments

Motivation

The purpose of extractMetaFromEntryLogs is to extract metadata, not perform garbage collection. The logic for removing entry logs with no active ledgers is handled separately in doGcEntryLogs().

Changes

This change removes the redundant cleanup code from extractMetaFromEntryLogs to better separate concerns and avoid duplication.

nodece avatar Apr 29 '25 13:04 nodece

rerun failure checks

StevenLuMT avatar May 06 '25 07:05 StevenLuMT

rerun failure checks

nodece avatar May 06 '25 07:05 nodece

rerun failure checks

nodece avatar May 07 '25 04:05 nodece

rerun failure checks

nodece avatar May 13 '25 07:05 nodece

@StevenLuMT @zymap

In the current implementation, all metadata is first extracted and stored in a map before performing a full GC pass over the entire dataset. This approach can result in high memory usage and delayed cleanup, especially in environments with large or fragmented metadata stores.

I suggest we should replace the current bulk extraction + GC logic with an incremental streaming approach, as follows:

Current Approach:

Map<String, Metadata> allMetadata = extractAllMetadata();
gc(allMetadata);

Proposed Approach:

for (String key : listAllMetadataKeys()) {
    Metadata metadata = extractMetadata(key);
    gc(metadata);
}

Benefits:

  • Reduced memory footprint: Avoids holding all metadata in memory at once.
  • More timely cleanup: GC is performed immediately after extraction.
  • Improved scalability: Handles large metadata sets more gracefully.

nodece avatar May 19 '25 03:05 nodece