valkey icon indicating copy to clipboard operation
valkey copied to clipboard

[NEW] Add Dedicated Metrics for Expiration Table Lazyfree Monitoring

Open Scut-Corgis opened this issue 1 month ago • 2 comments

The problem/use-case that the feature addresses

emptyDbAsync lazily frees three kvstores (main keys, expires TTL table, and keys_with_volatile_items), but existing lazyfree_objects/lazyfreed_objects metrics only count the main keys table. This creates two critical gaps:

  1. Inaccurate monitoring: Operators can’t track lazyfree progress for expiration-related objects (common in TTL-heavy cache workloads), leading to blind spots in memory management.
  2. Unreliable testing: Verifying that expires/keys_with_volatile_items are properly released requires monitoring raw memory changes (error-prone) instead of explicit counters. As in the lazyfree.tcl test, the criterion for determining whether the asynchronous release is successful is the memory status.

Description of the feature

Add two new atomic metrics to track expiration table lazyfree independently, while preserving backward compatibility with existing metrics:

  1. New metrics:

    static _Atomic size_t lazyfree_expire_objects = 0;  // Pending expiration-related objects for lazyfree
    static _Atomic size_t lazyfreed_expire_objects = 0; // Expiration-related objects successfully lazyfreed
    
  2. Code modifications:

    • In emptyDbAsync: Increment both existing and new metrics (sum expires + keys_with_volatile_items for the new metric):
      atomic_fetch_add_explicit(&lazyfree_objects, kvstoreSize(oldkeys), memory_order_relaxed); // Keep existing
      atomic_fetch_add_explicit(&lazyfree_expire_objects, 
                                kvstoreSize(oldexpires) + kvstoreSize(oldkeyswithexpires), 
                                memory_order_relaxed); // New metric
      
    • In lazyfreeFreeDatabase: Update new metrics alongside existing ones:
      size_t numkeys = kvstoreSize(da1);
      size_t num_expire_objs = kvstoreSize(da2) + kvstoreSize(da3); // da2=oldexpires, da3=oldkeyswithexpires
      
      kvstoreRelease(da1); kvstoreRelease(da2); kvstoreRelease(da3);
      
      // Update existing metrics (unchanged)
      atomic_fetch_sub_explicit(&lazyfree_objects, numkeys, memory_order_relaxed);
      atomic_fetch_add_explicit(&lazyfreed_objects, numkeys, memory_order_relaxed);
      
      // Update new expiration metrics
      atomic_fetch_sub_explicit(&lazyfree_expire_objects, num_expire_objs, memory_order_relaxed);
      atomic_fetch_add_explicit(&lazyfreed_expire_objects, num_expire_objs, memory_order_relaxed);
      

Alternatives you've considered

  1. Merge expiration counts into existing metrics: Would break backward compatibility (existing users rely on lazyfree_objects representing only main table objects) and lose granularity.
  2. Reuse existing metric names with expanded scope: Causes confusion for operators monitoring main table lazyfree progress.
  3. No new metrics (extend existing): Fails to address the core need of distinguishing main vs. expiration table release—critical for debugging TTL-related memory issues.

The proposed approach balances granularity, compatibility, and simplicity, avoiding tradeoffs of alternatives.

Additional information

  • Overhead: Negligible— atomic increments/decrements have minimal performance impact.
  • Backward compatibility: Existing metrics and workflows remain unchanged; new metrics are opt-in for users needing expiration table visibility.
  • PR readiness: I’m willing to submit a PR implementing this feature if aligned with project goals.

Scut-Corgis avatar Nov 14 '25 04:11 Scut-Corgis

I don't think we will do this, there is no enough arguments to do it. @zuiderkwast Do you want to comment this one?

enjoy-binbin avatar Nov 24 '25 07:11 enjoy-binbin

@zuiderkwast Do you want to comment this one?

No, I don't want to comment. 😄 I don't have much experience with this. Maybe @hpatro or @ranshid wants to comment?

zuiderkwast avatar Nov 25 '25 09:11 zuiderkwast