[NEW] Add Dedicated Metrics for Expiration Table Lazyfree Monitoring
The problem/use-case that the feature addresses
emptyDbAsync lazily frees three kvstores (main keys, expires TTL table, and keys_with_volatile_items), but existing lazyfree_objects/lazyfreed_objects metrics only count the main keys table. This creates two critical gaps:
- Inaccurate monitoring: Operators can’t track lazyfree progress for expiration-related objects (common in TTL-heavy cache workloads), leading to blind spots in memory management.
- Unreliable testing: Verifying that
expires/keys_with_volatile_itemsare properly released requires monitoring raw memory changes (error-prone) instead of explicit counters. As in the lazyfree.tcl test, the criterion for determining whether the asynchronous release is successful is the memory status.
Description of the feature
Add two new atomic metrics to track expiration table lazyfree independently, while preserving backward compatibility with existing metrics:
-
New metrics:
static _Atomic size_t lazyfree_expire_objects = 0; // Pending expiration-related objects for lazyfree static _Atomic size_t lazyfreed_expire_objects = 0; // Expiration-related objects successfully lazyfreed -
Code modifications:
- In
emptyDbAsync: Increment both existing and new metrics (sumexpires+keys_with_volatile_itemsfor the new metric):atomic_fetch_add_explicit(&lazyfree_objects, kvstoreSize(oldkeys), memory_order_relaxed); // Keep existing atomic_fetch_add_explicit(&lazyfree_expire_objects, kvstoreSize(oldexpires) + kvstoreSize(oldkeyswithexpires), memory_order_relaxed); // New metric - In
lazyfreeFreeDatabase: Update new metrics alongside existing ones:size_t numkeys = kvstoreSize(da1); size_t num_expire_objs = kvstoreSize(da2) + kvstoreSize(da3); // da2=oldexpires, da3=oldkeyswithexpires kvstoreRelease(da1); kvstoreRelease(da2); kvstoreRelease(da3); // Update existing metrics (unchanged) atomic_fetch_sub_explicit(&lazyfree_objects, numkeys, memory_order_relaxed); atomic_fetch_add_explicit(&lazyfreed_objects, numkeys, memory_order_relaxed); // Update new expiration metrics atomic_fetch_sub_explicit(&lazyfree_expire_objects, num_expire_objs, memory_order_relaxed); atomic_fetch_add_explicit(&lazyfreed_expire_objects, num_expire_objs, memory_order_relaxed);
- In
Alternatives you've considered
- Merge expiration counts into existing metrics: Would break backward compatibility (existing users rely on
lazyfree_objectsrepresenting only main table objects) and lose granularity. - Reuse existing metric names with expanded scope: Causes confusion for operators monitoring main table lazyfree progress.
- No new metrics (extend existing): Fails to address the core need of distinguishing main vs. expiration table release—critical for debugging TTL-related memory issues.
The proposed approach balances granularity, compatibility, and simplicity, avoiding tradeoffs of alternatives.
Additional information
- Overhead: Negligible— atomic increments/decrements have minimal performance impact.
- Backward compatibility: Existing metrics and workflows remain unchanged; new metrics are opt-in for users needing expiration table visibility.
- PR readiness: I’m willing to submit a PR implementing this feature if aligned with project goals.
I don't think we will do this, there is no enough arguments to do it. @zuiderkwast Do you want to comment this one?
@zuiderkwast Do you want to comment this one?
No, I don't want to comment. 😄 I don't have much experience with this. Maybe @hpatro or @ranshid wants to comment?