refactor: Eliminate redundant cleanup logic in entry log extraction
Motivation
The purpose of extractMetaFromEntryLogs is to extract metadata, not perform garbage collection. The logic for removing entry logs with no active ledgers is handled separately in doGcEntryLogs().
Changes
This change removes the redundant cleanup code from extractMetaFromEntryLogs to better separate concerns and avoid duplication.
rerun failure checks
rerun failure checks
rerun failure checks
rerun failure checks
@StevenLuMT @zymap
In the current implementation, all metadata is first extracted and stored in a map before performing a full GC pass over the entire dataset. This approach can result in high memory usage and delayed cleanup, especially in environments with large or fragmented metadata stores.
I suggest we should replace the current bulk extraction + GC logic with an incremental streaming approach, as follows:
Current Approach:
Map<String, Metadata> allMetadata = extractAllMetadata();
gc(allMetadata);
Proposed Approach:
for (String key : listAllMetadataKeys()) {
Metadata metadata = extractMetadata(key);
gc(metadata);
}
Benefits:
- Reduced memory footprint: Avoids holding all metadata in memory at once.
- More timely cleanup: GC is performed immediately after extraction.
- Improved scalability: Handles large metadata sets more gracefully.