extensions icon indicating copy to clipboard operation
extensions copied to clipboard

[API Proposal]: Introduce new memory cache library

Open rafal-mz opened this issue 7 months ago • 51 comments

Background and motivation

We have two available memory cache libraries in .NET that are popular and well known - System.Runtime.Caching and Microsoft.Extensions.Memory.Cache. As already described in this issue there is a room for improvement in their public APIs. There are also efficiency problems that hit at high scale. Thus, our goal was to implement as efficient memory cache as possible for high concurrency and high traffic servers. During the process we've found a few additional opportunities to do slightly better than existing libraries.

  • Memory consumption in both of those caches is pretty high ( ~100 bytes per item).
  • There are no metrics that are emitted by default (important for distributed systems usage).
  • Current model does not make it straightforward to create core eviction algorithm variations in efficient way; Model of ICacheEntry is beefy and IMemoryCache is pretty shallow interface.
  • Both caches implementations are using background threads / timers. Our implementation maintains its state on set method, thus sparing some threadpool overhead over existing libraries. Our implementation is also friendlier to use, since it won't create a memory leak when not disposed (compared to System.Runtime.Caching which allocates timers on ctor).
  • API does not always lead the consumer to the pit of success by accepting heap allocated options objects in the runtime called API.

RCache implementation is based on open source library BitFaster.


Storing million <Guid, string> (Guid.ToString()) pairs in the cache.

Cache library Memory usage
RCache 11.27 MB
Microsoft.Extensions.Caching.Memory 43.36 MB

Storing pair of <int, int> (this is the best situation for RCache since no boxing on key and value). The latency includes optional cost of maintaining telemetry state on RCache side.

Cache library Remove TryGetMiss TryGetHit GetOrSet GetOrSet Dynamic TTL Set SetDynamicTTL
RCache 6.5 ns 10.0 ns 11.4 ns 14.7 ns 15.6 ns 28.9 ns 32.2 ns
Microsoft.Extensions.Caching.Memory 59.3 ns 39.0 ns 48.4 ns 52.1 ns 44.9 ns 125.5 ns 130.2 ns
System.Runtime.Caching 59 ns 54.8 ns 107.0 ns 175.8 ns 239.7 ns 1192.5 ns 1216.1 ns

Feature comparison table:

Feature RCache Microsoft.Extensions.Caching.Memory System.Runtime.Caching
Time-based eviction yes yes yes
Sliding time to evict yes yes yes
Callbacks on eviction yes** yes yes
Metrics yes yes no
Named caches yes no no
Generics support yes no no
Priority based eviction yes*** yes no
Runtime entry size calculation no yes no
Dynamic Time To Evict yes yes yes
Item update notification no yes no

** RCache is disposing entries on eviction and it can be used as eviction callback. The limitation though, is that eviction is done while cache state is updated (during set method) not directly when item is expired. *** Algorithm we use has a notion of three priorities (hot, warm, cold) and respect them while rotating items. Though, we don't allow to define priorities by customer or have direct control over it.

API Proposal

namespace Microsoft.Extensions.Cache.Memory;

/// <summary>
/// Static factory of <see cref="T:Microsoft.Extensions.Cache.Memory.RCache`2" /> implementations.
/// </summary>
/// <remarks>
/// Keys are never disposed by the implementation, even when they are disposable types.
/// Values are disposed on expiration or eviction if they implement <see cref="T:System.IDisposable" /> interface.
/// Values are synchronized during disposal, so that they are guaranteed to call dispose method in thread safe manner.
/// </remarks>
public static class RCache
{
    /// <summary>
    /// Creates <see cref="T:Microsoft.Extensions.Cache.Memory.RCache`2" /> with LRU (Least Recently Used) eviction policy and default options.
    /// </summary>
    /// <param name="name">Name of the cache.</param>
    /// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
    /// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
    /// <returns>Ready to use <see cref="T:Microsoft.Extensions.Cache.Memory.RCache`2" />.</returns>
    public static RCache<TKey, TValue> CreateLru<TKey, TValue>(string name) where TKey : notnull;

    /// <summary>
    /// Creates <see cref="T:Microsoft.Extensions.Cache.Memory.RCache`2" /> with LRU (Least Recently Used) eviction policy and custom name and options.
    /// </summary>
    /// <remarks>
    /// For AOT builds switch that is implemented in this method generates code for each possible generic permutation, since
    /// it is not known at compile time, which one will be used at runtime. Such generation currently cost around 47 KB for final binary size for single cache usage.
    /// Tested hello-world app had around 4.5 MB of total size, and this cost oscillated around 1% of total size.
    /// In non-AOT mode this method costs 391 B.
    /// </remarks>
    /// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
    /// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
    /// <param name="name">Name of the cache used to identify telemetry that it emits.</param>
    /// <param name="options">Cache options.</param>
    /// <returns>Ready to use <see cref="T:Microsoft.Extensions.Cache.Memory.RCache`2" />.</returns>
    public static RCache<TKey, TValue> CreateLru<TKey, TValue>(string name, RCacheLruOptions<TKey, TValue> options) where TKey : notnull;
}

/// <summary>
/// Minimalistic memory cache.
/// </summary>
/// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
/// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
public abstract class RCache<TKey, TValue> : IDisposable
 where TKey : notnull
{
    /// <summary>
    /// Gets the name of the cache instance.
    /// </summary>
    public abstract string Name { get; }

    /// <summary>
    /// Gets the capacity of the cache.
    /// </summary>
    public abstract int Capacity { get; }

    /// <summary>
    /// Gets a value from the cache.
    /// </summary>
    /// <param name="key">Data identifying the requested value.</param>
    /// <param name="value">Value to associated with the key or <see langword="null" /> when not found.</param>
    /// <returns>
    /// <see langword="true" /> when value exists in the cache, <see langword="false" /> otherwise.
    /// </returns>
    public abstract bool TryGet(TKey key, [MaybeNullWhen(false)] out TValue value);

    /// <summary>
    /// Sets a value to the cache.
    /// </summary>
    /// <remarks>
    /// The value's time to expire is set to the global default value defined for this cache instance.
    /// </remarks>
    /// <param name="key">Data identifying the requested value.</param>
    /// <param name="value">Value to cache on the heap.</param>
    /// <returns>
    /// Data just set to cache.
    /// </returns>
    public abstract TValue Set(TKey key, TValue value);

    /// <summary>
    /// Sets a value to the cache.
    /// </summary>
    /// <remarks>
    /// After time to expire has passed, a value is not retrievable from the cache.
    /// At the same time, it might not get disposed immediately and cache might keep the root for it for some time.
    /// </remarks>
    /// <param name="key">Data identifying the requested value.</param>
    /// <param name="value">Value to cache on the heap.</param>
    /// <param name="timeToExpire">Time after which value should be removed from the cache. It cannot be less than one millisecond.</param>
    /// <returns>
    /// Data just set to cache.
    /// </returns>
    public abstract TValue Set(TKey key, TValue value, TimeSpan timeToExpire);

    /// <summary>
    /// Gets a value or sets it if doesn't exist.
    /// </summary>
    /// <remarks>
    /// The value's time to expire is set to the global default value defined for this cache instance.
    /// </remarks>
    /// <typeparam name="TState">Type of the state passed to the function.</typeparam>
    /// <param name="key">Data identifying the requested value.</param>
    /// <param name="state">State passed to the factory function.</param>
    /// <param name="factory">Factory function that will be used to lazy create value when missed.</param>
    /// <returns>
    /// Data retrieved from the cache or created by the passed factory function.
    /// </returns>
    public abstract TValue GetOrSet<TState>(TKey key, TState state, Func<TKey, TState, TValue> factory);

     /// <summary>
    /// Gets a value or sets it if doesn't exist.
    /// </summary>
    /// <remarks>
    /// After time to expire has passed, a value is not retrievable from the cache.
    /// At the same time, it might not get disposed immediately and cache might keep the root for it for some time.
    /// </remarks>
    /// <typeparam name="TState">Type of the state passed to the function.</typeparam>
    /// <param name="key">Data identifying the requested value.</param>
    /// <param name="state">State passed to the factory function.</param>
    /// <param name="factory">Factory function that will be used to lazy create value when missed.</param>
    /// <returns>
    /// Data retrieved from the cache or created by the passed factory function.
    /// </returns>
    public abstract TValue GetOrSet<TState>(TKey key, TState state, Func<TKey, TState, (TValue value, TimeSpan timeToExpire)> factory);

    /// <summary>
    /// Gets a value or sets it if doesn't exist.
    /// </summary>
    /// <remarks>
    /// The value's time to expire is set to the global default value defined for this cache instance.
    /// </remarks>
    /// <param name="key">Data identifying the requested value.</param>
    /// <param name="factory">Factory function that will be used to lazy create value when missed.</param>
    /// <returns>
    /// Data retrieved from the cache or created by the passed factory function.
    /// </returns>
    public abstract TValue GetOrSet(TKey key, Func<TKey, TValue> factory);
     
     /// <summary>
    /// Gets a value or sets it if doesn't exist.
    /// </summary>
    /// <remarks>
    /// After time to expire has passed, a value is not retrievable from the cache.
    /// At the same time, it might not get disposed immediately and cache might keep the root for it for some time.
    /// </remarks>
    /// <param name="key">Data identifying the requested value.</param>
    /// <param name="factory">Factory function that will be used to lazy create value when missed.</param>
    /// <returns>
    /// Data retrieved from the cache or created by the passed factory function.
    /// </returns>
    public abstract TValue GetOrSet(TKey key, Func<TKey, (TValue value, TimeSpan timeToExpire)> factory);

    /// <summary>
    /// Gets count of cache items.
    /// </summary>
    /// <returns>
    /// This method may take read-write lock on the cache.
    /// Avoid using it in production code.
    /// </returns>
    [EditorBrowsable(EditorBrowsableState.Advanced)]
    public abstract int GetCount();

    /// <summary>
    /// Removes all elements from the cache.
    /// </summary>
    public abstract void Clear();

    /// <summary>
    /// Attempts to remove the value that has the specified key.
    /// </summary>
    /// <param name="key">The key of the element to remove.</param>
    /// <returns>
    /// <see langword="true" /> when value was removed, and <see langword="false" /> when key is not found.
    /// </returns>
    public abstract bool Remove(TKey key);

    /// <summary>
    /// Removes all expired entries from the cache.
    /// </summary>
    /// <remarks>
    /// Call this method when you are using cache in read-only mode for longer periods of time.
    /// Cache maintenance is most likely done out of the hot path (on misses or value insertion).
    /// This is manually calling eviction mechanism, to get rid of expired references from the cache.
    /// </remarks>
    public abstract void RemoveExpiredEntries();
    
    /// <summary>
    /// Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
    /// </summary>
    /// <remarks>
    /// Cache is not usable after calling this method.
    /// Cache is going to dispose all the items it keeps in the moment of disposal.
    /// </remarks>
    public abstract void Dispose();
}

/// <summary>
/// Extension methods for caching.
/// </summary>
public static class RCacheExtensions
{
    /// <summary>
    /// Add LRU (Least Recently Used) implementation of memory cache to service collection.
    /// </summary>
    /// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
    /// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
    /// <param name="services">Service collection to register cache into.</param>
    /// <returns>Passed instance of <see cref="T:Microsoft.Extensions.DependencyInjection.IServiceCollection" /> for further configuration.</returns>
    /// <exception cref="T:System.ArgumentNullException">When passed <paramref name="services" /> are <see langword="null" />.</exception>
    public static IServiceCollection AddLruRCache<TKey, TValue>(this IServiceCollection services) where TKey : notnull;

    /// <summary>
    /// Add named LRU (Least Recently Used) implementation of memory cache to service collection.
    /// </summary>
    /// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
    /// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
    /// <param name="services">Service collection to register cache into.</param>
    /// <param name="name">Name of the cache, used to reference it from <see cref="T:Microsoft.Extensions.Cache.Memory.IRCacheProvider" /> and distinguish telemetry that it is producing.</param>
    /// <returns>Passed instance of <see cref="T:Microsoft.Extensions.DependencyInjection.IServiceCollection" /> for further configuration.</returns>
    /// <exception cref="T:System.ArgumentNullException">When passed <paramref name="services" /> are <see langword="null" />.</exception>
    /// <exception cref="T:System.ArgumentException">When passed <paramref name="name" /> is <see langword="null" /> or empty.</exception>
    public static IServiceCollection AddLruRCache<TKey, TValue>(this IServiceCollection services, string name) where TKey : notnull;

    /// <summary>
    /// Add LRU (Least Recently Used) implementation of memory cache with customized options to service collection.
    /// </summary>
    /// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
    /// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
    /// <param name="services">Service collection to register cache into.</param>
    /// <param name="configure">A function used to configure cache options.</param>
    /// <returns>Passed instance of <see cref="T:Microsoft.Extensions.DependencyInjection.IServiceCollection" /> for further configuration.</returns>
    /// <exception cref="T:System.ArgumentNullException">When passed <paramref name="services" /> or <paramref name="configure" /> are <see langword="null" />.</exception>
    public static IServiceCollection AddLruRCache<TKey, TValue>(this IServiceCollection services, Action<RCacheLruOptions<TKey, TValue>> configure) where TKey : notnull;

    /// <summary>
    /// Add LRU (Least Recently Used) implementation of memory cache with customized options to service collection.
    /// </summary>
    /// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
    /// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
    /// <param name="services">Service collection to register cache into.</param>
    /// <param name="section">Configuration part that defines cache options.</param>
    /// <returns>Passed instance of <see cref="T:Microsoft.Extensions.DependencyInjection.IServiceCollection" /> for further configuration.</returns>
    /// <exception cref="T:System.ArgumentNullException">When passed <paramref name="services" /> or <paramref name="section" /> are <see langword="null" />.</exception>
    [UnconditionalSuppressMessage("AOT", "IL3050:Calling members annotated with 'RequiresDynamicCodeAttribute' may break functionality when AOT compiling.", Justification = "<Pending>")]
    public static IServiceCollection AddLruRCache<TKey, TValue>(this IServiceCollection services, IConfigurationSection section) where TKey : notnull;

    /// <summary>
    /// Add named LRU (Least Recently Used) implementation of memory cache with customized options to service collection.
    /// </summary>
    /// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
    /// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
    /// <param name="services">Service collection to register cache into.</param>
    /// <param name="name">Name of the cache, used to reference it from <see cref="T:Microsoft.Extensions.Cache.Memory.IRCacheProvider" /> and distinguish telemetry that it is producing.</param>
    /// <param name="section">Configuration part that defines cache options.</param>
    /// <returns>Passed instance of <see cref="T:Microsoft.Extensions.DependencyInjection.IServiceCollection" /> for further configuration.</returns>
    /// <exception cref="T:System.ArgumentNullException">When passed <paramref name="services" /> or <paramref name="section" /> are <see langword="null" />.</exception>
    /// <exception cref="T:System.ArgumentException">When passed <paramref name="name" /> is <see langword="null" /> or empty.</exception>
    [UnconditionalSuppressMessage("AOT", "IL3050:Calling members annotated with 'RequiresDynamicCodeAttribute' may break functionality when AOT compiling.", Justification = "<Pending>")]
    public static IServiceCollection AddLruRCache<TKey, TValue>(this IServiceCollection services, string name, IConfigurationSection section) where TKey : notnull;

    /// <summary>
    /// Add named LRU (Least Recently Used) implementation of memory cache with customized options to service collection.
    /// </summary>
    /// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
    /// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
    /// <param name="services">Service collection to register cache into.</param>
    /// <param name="name">Name of the cache, used to reference it from <see cref="T:Microsoft.Extensions.Cache.Memory.IRCacheProvider" /> and distinguish telemetry that it is producing.</param>
    /// <param name="configure">A function used to configure cache options.</param>
    /// <returns>Passed instance of <see cref="T:Microsoft.Extensions.DependencyInjection.IServiceCollection" /> for further configuration.</returns>
    /// <exception cref="T:System.ArgumentNullException">When passed <paramref name="services" /> or <paramref name="configure" /> are <see langword="null" />.</exception>
    /// <exception cref="T:System.ArgumentException">When passed <paramref name="name" /> is <see langword="null" /> or empty.</exception>
    public static IServiceCollection AddLruRCache<TKey, TValue>(this IServiceCollection services, string name, Action<RCacheLruOptions<TKey, TValue>> configure) where TKey : notnull;
}

/// <summary>
/// Options for LRU (Least recently used) implementation of <see cref="T:Microsoft.Extensions.Cache.Memory.RCache`2" />.
/// </summary>
/// <typeparam name="TKey">Type of keys stored in the cache.</typeparam>
/// <typeparam name="TValue">Type of values stored in the cache.</typeparam>
public class RCacheLruOptions<TKey, TValue> where TKey : notnull
{
    /// <summary>
    /// Value cannot be less than one millisecond since it is the precision of eviction mechanism.
    /// </summary>
    public static readonly TimeSpan MinTimeToEvict;

    /// <summary>
    /// Gets or sets the maximum number of values that can be stored in the cache.
    /// </summary>
    /// <remarks>
    /// Default set to 3 which is minimum length required by implementation to work.
    /// </remarks>
    [Range(3, int.MaxValue)]
    public int Capacity { get; set; }

    /// <summary>
    /// Gets or sets the default time to evict of cache values.
    /// </summary>
    /// <remarks>
    /// This value is used for APIs that does not accept time to evict parameter.
    /// Default set to 5 minutes.
    /// If you don't want your items to be ever evicted, set this value to <see cref="F:System.TimeSpan.MaxValue" />.
    /// </remarks>
    [TimeSpan("00:00:00:0001")]
    public TimeSpan DefaultTimeToEvict { get; set; }

    /// <summary>
    /// Gets or sets the time to evict that an item is prolonged for after hit.
    /// </summary>
    /// <remarks>
    /// Default set to 5 minutes.
    /// This value is ignored when <see cref="P:Microsoft.Extensions.Cache.Memory.RCacheLruOptions`2.ExtendTimeToEvictAfterHit" /> is <see langword="false" />.
    /// </remarks>
    [TimeSpan("00:00:00:0001")]
    public TimeSpan ExtendedTimeToEvictAfterHit { get; set; }

    /// <summary>
    /// Gets or sets a value indicating whether an entry's time to evict should be refreshed after a cache hit.
    /// </summary>
    /// <remarks>
    /// Default set to <see langword="false" />.
    /// </remarks>
    public bool ExtendTimeToEvictAfterHit { get; set; }

    /// <summary>
    /// Gets or sets a the cache's level of concurrency.
    /// </summary>
    /// <remarks>
    /// Default is set to <see cref="P:System.Environment.ProcessorCount" /> up to 8 cores on the machine.
    /// Otherwise, it is <see cref="P:System.Environment.ProcessorCount" /> / <see cref="F:Microsoft.Extensions.Cache.Memory.RCacheLruOptions`2.ConcurrencyLevelDivisor" />.
    /// Increase this value if you observe lock contention.
    /// </remarks>
    [EditorBrowsable(EditorBrowsableState.Advanced)]
    public int ConcurrencyLevel { get; set; }

    /// <summary>
    /// Gets or sets a key comparer.
    /// </summary>
    /// <remarks>
    /// Default is set to <see cref="P:System.Collections.Generic.EqualityComparer`1.Default" />.
    /// For string keys it is <see cref="P:System.StringComparer.Ordinal" />.
    /// </remarks>
    [EditorBrowsable(EditorBrowsableState.Advanced)]
    public IEqualityComparer<TKey> KeyComparer { get; set; }

    /// <summary>
    /// Gets or sets a value indicating how often cache metrics are refreshed internally.
    /// </summary>
    /// <remarks>
    /// Querying all metrics require summing values of every one of them from different threads.
    /// This may not be optimal for a cache, so changing this value allows you to balance cost of querying with how many data points you collect about cache state.
    /// Setting this value very low, and querying for metrics in short time intervals may interfere with how efficient cache is working itself.
    /// Default set to 30 seconds.
    /// </remarks>
    [TimeSpan("00:00:05")]
    public TimeSpan MetricPublicationInterval { get; set; }

    /// <summary>
    /// Gets or sets a value indicating whether metrics are published or not.
    /// </summary>
    /// <remarks>
    /// Metrics are published by default (the value is set to <see langword="true" />).
    /// </remarks>
    public bool PublishMetrics { get; set; }
}

/// <summary>
/// Metrics published by <see cref="T:Microsoft.Extensions.Cache.Memory.RCache`2" />.
/// </summary>
public static class RCacheMetrics
{
    /// <summary>
    /// Name of the <see cref="T:System.Diagnostics.Metrics.Meter" /> to listen to for obtaining cache metrics.
    /// </summary>
    /// <example>
    /// RCache with name "test" will publish metrics with tag "cache-name'"equal to "test".
    /// </example>
    public const string LruCacheMeterName = "cache.memory.lru";

    /// <summary>
    /// Gets the total number of cache queries that were successful.
    /// </summary>
    /// <remarks>
    /// Metric is exposed as <see cref="T:System.Diagnostics.Metrics.ObservableCounter`1" /> with value being <see cref="T:System.Int64" />.
    /// </remarks>
    public const string Hits = "process.cache.memory.hits.count";

    /// <summary>
    /// Gets the total number of unsuccessful cache queries.
    /// </summary>
    /// <remarks>
    /// Metric is exposed as <see cref="T:System.Diagnostics.Metrics.ObservableCounter`1" /> with value being <see cref="T:System.Int64" />.
    /// </remarks>
    public const string Misses = "process.cache.memory.misses.count";

    /// <summary>
    /// Gets the total number of expired values.
    /// </summary>
    /// <remarks>
    /// Metric is exposed as <see cref="T:System.Diagnostics.Metrics.ObservableCounter`1" /> with value being <see cref="T:System.Int64" />.
    /// Expired values are one removed from cache because they were too old.
    /// </remarks>
    public const string Expirations = "process.cache.memory.expirations.count";

    /// <summary>
    /// Gets the total number of values added to cache.
    /// </summary>
    /// <remarks>
    /// This value refers to total calls to set method, and does not include updates.
    /// Metric is exposed as <see cref="T:System.Diagnostics.Metrics.ObservableCounter`1" /> with value being <see cref="T:System.Int64" />.
    /// </remarks>
    public const string Adds = "process.cache.memory.adds.count";

    /// <summary>
    /// Gets the total number of cache removals.
    /// </summary>
    /// <remarks>
    /// This value refers to total calls to remove method, and does not include evictions.
    /// If you are interested in metric of total values being removed from the cache, add <see cref="F:Microsoft.Extensions.Cache.Memory.RCacheMetrics.Evictions" /> and <see cref="F:Microsoft.Extensions.Cache.Memory.RCacheMetrics.Expirations" />.
    /// Metric is exposed as <see cref="T:System.Diagnostics.Metrics.ObservableCounter`1" /> with value being <see cref="T:System.Int64" />.
    /// </remarks>
    public const string Removals = "process.cache.memory.removals.count";

    /// <summary>
    /// Gets the total number of evicted values.
    /// </summary>
    /// <remarks>
    /// Metric is exposed as <see cref="T:System.Diagnostics.Metrics.ObservableCounter`1" /> with value being <see cref="T:System.Int64" />.
    /// Evicted values are those removed based on the implementation's eviction policy and not time expiration or intentional removal.
    /// </remarks>
    public const string Evictions = "process.cache.memory.evictions.count";

    /// <summary>
    /// Gets the total number of updated values.
    /// </summary>
    /// <remarks>
    /// Metric is exposed as <see cref="T:System.Diagnostics.Metrics.ObservableCounter`1" /> with value being <see cref="T:System.Int64" />.
    /// </remarks>
    public const string Updates = "process.cache.memory.updates.count";

    /// <summary>
    /// Gets the total number of values in the cache over time.
    /// </summary>
    /// <remarks>
    /// Metric is exposed as <see cref="T:System.Diagnostics.Metrics.ObservableGauge`1" /> with value being <see cref="T:System.Int64" />.
    /// </remarks>
    public const string Count = "process.cache.memory.count";

    /// <summary>
    /// Gets the gauge of elements compacted in the cache.
    /// </summary>
    /// <remarks>
    /// Metric is exposed as <see cref="T:System.Diagnostics.Metrics.Histogram`1" /> with value being <see cref="T:System.Int64" />.
    /// </remarks>
    public const string Compacted = "process.cache.memory.compacted_items.count";
}

API Usage

Add default cache to DI.

IServiceCollection services;

services.AddLruRCache<Guid, User>();

services.AddLruRCache<Guid, User>("different", options => options.PublishMetrics = false);

Get default cache from DI.

namespace Example;

using Microsoft.Extensions.Cache;

public class UserService
{
    private readonly RCache<Guid, User> _users;

    public UserService(RCache<Guid, User> users)
    {                  ^^^^^^^^^^^^^^^^^^^^^^^^
        _users = users
    }
}

Create cache from factory.

namespace Example;

using Microsoft.Extensions.Cache;

public class UserService
{
    private readonly RCache<Guid, User> _users;

    public UserService()
    {
        _users = RCache.CreateLru<Guid, User>("my-cache-name");
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    }
}

Alternative Designs

Callbacks and notifications We could introduce a feature from Microsoft.Extensions.Caching.Memory that allows to track if value was changed or implement eviction callbacks. Instead of storing a callback with each entry or some change tokens we could use System.Threading.Channels library. Through exposed ChannelReader, consumer could react on events like eviction, mutation and so on. I expect this design to be more memory/cpu efficient than existing one.

Size check We could implement item size limit through interface implemented on stored type by library client.

public interface ISizer
{
   int GetSizeInBytes();
}

This design would allow us to implement size check without boxing TValue or introducing CPU branches on hot-path.

Asynchronous interface

We wanted to be explicit that everything in RCache should be sync. This approach allows us not to go into distributed systems problems, since async caches requires much richer semantics. We are going to propose another interface in the future that covers asynchronous caches scenarios and more.

Enumeration

We decided not to implement enumeration on RCache:

  • Enumeration is costly on highly concurrent data structure due to need to synchronization.
  • We didn't find evidence for this interface to be essential for memory cache data structure.

Risks

Another cache This is yet another memory cache implementation which may confuse .NET community. We should be clear in framing the purpose of this interface, and make sure design is extendable enough so we don't need to repeat the same process in the future.

Layering Is dotnet/extensions the right place for this code? For me it looks like a general purpose data structure that should be available for the whole stack. Should we introduce it to runtime repository?

Deferred removal Cache item removal is deffered which can keep alive objects for longer than they need to be.

rafal-mz avatar Nov 29 '23 00:11 rafal-mz

Does it interoperate with TimeProvider?

KalleOlaviNiemitalo avatar Nov 30 '23 19:11 KalleOlaviNiemitalo

No, it doesn't. The whole purpose of this library is to speed up things. We would need to introduce time provider in the hot-path, which adds a virtual dispatch overhead. We can add 'FakeLruRCache' which would be no different from original implementation beside using TimeProvider. That would allow to test algorithm alone.

update

Actually, we could add TimeProvider in options as well, and until it would be overriden manually we could ignore it. I think it would solve the perf concern.

rafal-mz avatar Nov 30 '23 19:11 rafal-mz

What is the thread safety model here of the valueFactory? I typically wrap MemoryCache to ensure that the value factory is only called once in case of a cache miss. This always seemed self evident to me, since you wouldn't be caching in the first place if the valueFactory was cheap.

And if there is a locking mechanism, is it global or is it dependent on the key, like in ConcurrentDictionary?

amoerie avatar Dec 01 '23 11:12 amoerie

What is the thread safety model here of the valueFactory?

We are not preventing stampedes atm, so there is a chance that factory is called more than once. Our model is basically the same what concurrent dictionary GetOrSet is doing. The solutions I see:

  • Lazy{T},
  • Using accepting TState GetOrSet overload and implement locking in there
  • Add one more overload for RCache like GetOrSetLazy (?) that encapsulates Lazy approache in the cache itself.
  • Extension method over RCache that is doing one of the mentioned above and comes from library.
  • Explicitly add different cache implementation like 'LruAtomicWrites'.

If there is something I am missing let me know.

And if there is a locking mechanism, is it global or is it dependent on the key, like in ConcurrentDictionary?

RCache is wrapping over concurrent dictionary and it has no global locks.

rafal-mz avatar Dec 01 '23 13:12 rafal-mz

Nice. I like it. If the Lazy<Item> usecase for expensive valueFactories is nicely documented and tested (single execution guraranteed), then it does not need extra API support I believe.

What I do VERY often is Lazy<Task<TValue>>. So that concurrent requests await the same long running task. Please take this into account in examples, tests and documentation. 🙏

Richard004 avatar Dec 01 '23 15:12 Richard004

Please consider support for "keyed service dependency injection".
[FromKeyedServices("usersCache")] [FromKeyedServices("projectsCache")] Typically I would configure expiration times on the level of the keyed instance of cache for all items in the once cache. But maybe your idea about naming these and obtaining from factory is a better consideration.

Richard004 avatar Dec 01 '23 15:12 Richard004

The library was originally developed before key service support was released, so it comes with:

public interface IRCacheProvider
{
  public RCache<TKey, TValue> Get<TKey, TValue>(string name);
}

and getting it by name from DI:

namespace Example;

using Microsoft.Extensions.Cache;

public class UserService
{
    private readonly RCache<Guid, User> _users;

    public UserService(IRCacheProvider caches)
    {
        _users = caches.Get<Guid, User>("my-cache-name");
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    }
}

I didn't include it in the proposal because it will be changed towards keyd services.

rafal-mz avatar Dec 01 '23 15:12 rafal-mz

@rafal-mz Is RCache in a repo somewhere to peek at? I've used and even customized Bitfaster in the past so curious to see what changes were made in this case.

to11mtm avatar Dec 01 '23 19:12 to11mtm

RCache is not yet public. I will create a PR and link here as soon as it is possible.

rafal-mz avatar Dec 01 '23 20:12 rafal-mz

Interesting proposal. Thanks for asking the community for feedback. I am sure lots of us value effort being put into the caching bits of the framework.

For me the Lazy ("cache stampede") factory evaluation requirement is pretty fundamental. Personally I struggle to find use cases that warrant a cache without lazy factory. Either not needing a lazy factory shows you don't actually need a cache at all (aka premature optimisation), or you need a cache such that it must have a lazy factory to not overwhelm your system. My experience says it should be lazy factory by default, and lazy factory is what people expect when using a cache for the first time. But I totally accept I'm biased as the author of a library that does just that! I would like to hear from use cases that need a cache without lazy factory behaviour.

And a couple of questions:

  • Can you explain the name RCache? What does it stand for? Doesn't seem intuitive to me.
  • I understand the need to add a new cache API for higher performance caching but why not just fix/expand the existing Microsoft libraries? I'm sure there is good reason but it needs setting out in writing because a framework with 3 official cache libraries is not a great experience either.
  • What is the difference between the proposal and the existing open source Bitfaster library? Could that be improved instead? Why not just support and direct people to existing existing open source community work if they need the highest performance cache?

alastairtree avatar Dec 02 '23 00:12 alastairtree

Thanks for your feedback. I really appreciate it.

Either not needing a lazy factory shows you don't actually need a cache at all (aka premature optimisation).

I would not go that far. Cache can be used as general purpose data structure that keep memory constrained through eviction policy and capacity limit. I've seen (written few myself :-)) far too many concurrent dictionaries called 'cache' that were accumulating memory to an infinite. RCache is just a concurrent dictionary wrapper that adds efficient eviction policy, so I would not limit the use cases to 'costly to create'. Sometimes it may be 'costly to keep forever'. Generally, the cost of creation matter less the longer entry TTL.

So far I've had an opinion that fast and simple is better than safe and slow. Now I am not sure. Need to think it through.

Can you explain the name RCache? What does it stand for? Doesn't seem intuitive to me.

We just had to have a different name than existing MemoryCache. RCache was developed in a project that has a name starting with 'R'.

I understand the need to add a new cache API for higher performance caching but why not just fix/expand the existing Microsoft libraries?

.NET team has long lasting culture of keeping strict backward compatibility guarantees. That's why expanding is hard when original implementations are not abstract classes. Microsoft.Extensions.Caching.Memory is getting better with every version but if the interface is locked it is only so much you can do.

Hopefully the interface we propose now can be extendable enough that in the future we can still improve the library and not introduce yet another API.

I'm sure there is good reason but it needs setting out in writing because a framework with 3 official cache libraries is not a great experience either.

We are aware of this problem that's why we call it out in risks. As mentioned above we cannot remove we can just add.

What is the difference between the proposal and the existing open source Bitfaster library? Could that be improved instead? Why not just support and direct people to existing existing open source community work if they need the highest performance cache?

RCache so far implements one algorithm out of many that are available in BitFaster. Some technical details in RCache are different compared to original. Author of BitFaster is a Microsoft employee and he was involved in the process.

Its worth to mention that there is also a layering problem. For instance, for a library to be used in runtime repo, it needs to be pulled there. This is also something we call out in risks.

rafal-mz avatar Dec 02 '23 00:12 rafal-mz

Interesting proposal. Thanks for asking the community for feedback. I am sure lots of us value effort being put into the caching bits of the framework.

For me the Lazy ("cache stampede") factory evaluation requirement is pretty fundamental. Personally I struggle to find use cases that warrant a cache without lazy factory. Either not needing a lazy factory shows you don't actually need a cache at all (aka premature optimisation), or you need a cache such that it must have a lazy factory to not overwhelm your system. My experience says it should be lazy factory by default, and lazy factory is what people expect when using a cache for the first time. But I totally accept I'm biased as the author of a library that does just that! I would like to hear from use cases that need a cache without lazy factory behaviour.

This does bring up the point that, If there is a good way to add some additional abstractions to make Lazy evaluation simple and efficient, that would be great.

At the same time, I know of at least two larger OSS libs that may be able to benefit from a more raw base abstraction. (in one case I saw equivalent performance with lower memory usage compared to a very nice, hand crafted robin hood hash)

And a couple of questions:

* What is the difference between the proposal and the existing open source Bitfaster library? Could that be improved instead? Why not just support and direct people to existing existing open source community work if they need the highest performance cache?

If I had to guess, From my own experience hacking at it, Bitfaster is a very good caching library, however if you need a portion of the functionality it is better to specialize the implementation you can get even more from it. Kinda per above, on the average it's really good, but a purpose trimmed and tuned 'general but fixed use case' version is competitive with a 'hand chosen and written algo' cache.

to11mtm avatar Dec 02 '23 02:12 to11mtm

From my own experience hacking at it, Bitfaster is a very good caching library, however if you need a portion of the functionality it is better to specialize the implementation you can get even more from it.

I am not familiar with it but was contributing to "Bitfaster" considered as an alternative? (Can we cc their owners here?)

danmoseley avatar Dec 02 '23 03:12 danmoseley

@alastairtree I believe the value of a Lazy model is for a small cache with a high likelihood of concurrent accesses to the same state. But in a high traffic service which may have millions of entries in cache, the risk of wasted work due to concurrent evaluation of the factory method is low enough to simply not be a concern.

However, if it is a concern, then the cache can simply be configured to hold Lazy<T> and then it will work as needed and prevent redundant calls to the factory method. I think that seems like a good place to be.

Now even if your cache sometimes experiences collisions and does redundant work as a result, it doesn't mean use of Lazy<T> is warranted. It depends on the observed rate of collisions. Using a Lazy<T> imparts a definite measurable cost to every use of the cache, whereas collisions only cost you when they occur. Thus a low collision rate may be cheaper than using Lazy<T>.

geeknoid avatar Dec 02 '23 07:12 geeknoid

I am excited about the introduction of a new memory cache and will follow its final specification if it proceeds. My experience with caching is not detailed enough to properly take part in the current level of conversation but I am happy to provide input and feedback as the design and first iteration progresses. Picking Bitfaster and extending/improving it sounds like the right plan. The current proposal appears to nearly match the existing Bitfaster or am I missing some key items?

RobDeVoer avatar Dec 02 '23 08:12 RobDeVoer

@RobDeVoer It's very similar on the inside, but the API surface has been considerably simplified by hiding all the complications around how monomorphisation is achieved. And we have some ideas in the pipeline which may deliver additional raw performance to the code as well, but those are still experimental in nature.

geeknoid avatar Dec 02 '23 20:12 geeknoid

Hi all, here's my thoughts on this effort: I tried to condense all of them into a single answer, therefore sorry for it being quite long.

First of all, as others already said, thanks for discussing this in public to gather the community's opinions, including the ones by OSS maintainers like me, @alastairtree , @Turnerj and others. Really appreciated!

The most important thing I'd like to highlight is that when this will be finalized and released (I suppose in the .NET 9 timeframe), it will be important to clearly explain the positioning of it. As already discussed on the previous issue caching is a tough beast with quite different use cases, the 2 primary ones being:

  • low-level caching: basically a dictionary + some form of expiration, where the data is typically homogeneous (of the same type), typically privately owned by a single piece of code (eg: internally by libraries like EF), where the data is not shared app-wide, the type of the cache key should be anything (so no boxing, etc) and with a limited feature set. Also typically there will be a lot of different instances of it
  • app-level caching: is still caching, but more "high level", where the data may be heterogeneous (of different types), the data is (or may be) shared between different pieces of code, the type of the key is typically string to allow for some higher level features like prefix/suffix addition for automatic scoping/less collisions and with a generally richer feature set, with typically few instances of it

Following this distinction I see this new RCache as part of the 1st category, and in this sense it feels good to me: the scope is quite narrow, no extra high-level features and the attention to micro-benchmarking is quite high. FWIW, good 👍

RCache

As pointed out by others the name is quite strange: I feel you in wanting to avoid some other uber general/all encompassing name and to avoid confusion with the other 2 existing _memory cache_s, so that this also feels "smaller/specialized/optimized" (again, 1st category from above). Having said that, naming it RCache just because the project where it originated started with an "R"... I don't know. Maybe something that communicate the low-level/no-frills nature of it like RawCache or similar? Mind you, it's probably a minor thing, but still wanted to point that out.

  • Both caches implementations are using background threads / timers. Our implementation maintains its state on set method, thus sparing some threadpool overhead over existing libraries. Our implementation is also friendlier to use, since it won't create a memory leak when not disposed (compared to System.Runtime.Caching which allocates timers on ctor).

Understood, but to do this it seems you are not cleaning up automatically: not having the code to look at I'm wondering if a SET call of key "foo" will also cleanup the expired entry for cache key "bar". If this is not the case you may run into memory remaining allocated by cache entries that ended up not being touched for a long time, so this is something to look out for. A different approach may be to include timed background cleanup but as optional, with something like a bool flag in the general options or a TimeSpan? (null being the default, so no auto-cleanup). This way by default no leaks will be guaranteed but, if wanted, the ease of it can be obtained. I think the related code should not be that much.

** RCache is disposing entries on eviction and it can be used as eviction callback.

The feature is nice, but watch out for this (been there): I would enable automatic Dispose calls on eviction only as an opt-in feature (like bool EnableAutoDisposeOnEvict), since sometimes I saw usages in the wild where something is get from the cache and kept used only to become randomly disposed while still being used, for this exact behavior.

API Proposal ...

In general it looks quite good to me: I like that it is an abstract class with different factory methods so that currently there's only the LRU special case but I can imagine in the future there may be others (LFU, TLRU, PLRU, MRU, RR, FIFO, etc).

A couple of notes though:

  • factory methods: why 2 of them instead of only one with name + options and options being nullable? If tomorrow a new param param3 would arrive, that may generate new permutations (name + param3, name + options + param3, etc). Nothing major, just wondering
  • named caches: thinking about the DI support, I assume calling CreateLru<TKey, TValue>("foo") multiple times will return the same instance right? What about calling it twice with the same name but different options? Or with the same name but with different types? Is the name unique per-types-combination or global?
  • options: RCacheLruOptions<TKey, TValue> why the generic typing here? Currently there does not seem to be nothing justying that, and as it is it would avoid sharing the same options with different instances. Maybe is it to make it future proof? Admittedly the cost here is nothing, so again just wondering
  • ExtendedTimeToEvictAfterHit vs ExtendTimeToEvictAfterHit: I suggest to use a Is/Enable/etc prefix for bool options. In this case specifically the 2 options looks quite identical, so by using something like EnableExtendedTimeToEvictAfterHit it would make it clearer which one is which
  • methods overloads: I see different overloads with/without optional params (like 4 different GetOrSets). I understand the need but down the line it may be quite daunting to maintain/evolve with this approach
  • factory signature: you are using the specific params currently used, I can imagine to avoid bloat. While I feel that, I also can see a problem in the future when new factory params/return values may be needed, and then you would need to add more overloads and handle them all and the permutations will become quite a lot (been there...)
  • the bool TryGet(out ...) signature will not be usable in case one day you'd like to add async support, so you'll eventually end up with 2 different methods with 2 very different signatures for the sync and async versions. I'd suggest to start immediately with a signature that will be able to support async. In my project I have used a simple MaybeValue<T> struct, kinda like a Nullable<T> but for both ref/value type (like a maybe/option monad). It works well, is not heavy on resources and is easily evolvable into the async territory

Another thing is that, as already said by others, the lack of cache stampede prevention is quite a big thing. I can feel not wanting to go there, but at the same time it's something quite important and expected, and something that is already missing in the current MemoryCache (with the added "bonus" there that people usually think the GetOrCreate ext method will protect them, when instead it will not).

Because of this, if you decide not to implement it, I think it should be very clear to consumers that there's no protection there.

Also, now that I'm thinking about it: if no stampede prevention is provided, and no async support is provided (see later), why having a GetOrSet at all?

By not having it:

  • it would make it clear that there's no stampede prevention
  • it would avoid having to support its different overloads (including the TState support)

I mean, I understand it's an attractive method to have, but it may be counterproductive.

Alternative Designs

Callbacks and notifications [...] Instead of storing a callback with each entry or some change tokens we could use System.Threading.Channels [...] I expect this design to be more memory/cpu efficient than existing one

Oh yes please: if you decide to go with the callbacks route, a single stream of events would definitely be better then a ton of specific callbacks stored inside each cache entry.

Size check We could implement item size limit through interface implemented on stored type by library client.

public interface ISizer
{
   int GetSizeInBytes();
}

This design would allow us to implement size check without boxing TValue or introducing CPU branches on hot-path.

This may be a nice idea: in case it will go forward I think it should be explained more and experimented with, including some practical examples on how to implement it.

Asynchronous interface

We wanted to be explicit that everything in RCache should be sync.

Understood, but as said by providing a GetOrSet method with a factory passed to it, I think users will expect both of these:

  • stampede is take care of
  • both sync/async are supported

Anyway fingers crossed for you: I hope it will be enough for the community... but I'm not that sure.

This approach allows us not to go into distributed systems problems, since async caches requires much richer semantics.

Eh, definitely, I feel that 😅

We are happy though to add 'TryAdd' to our proposal, to avoid stampedes for async factories.

How would a TryAdd method solve the stampede for async operations? If there's stampede prevention, it should be in the normal GetOrSet method(s) available, maybe optionally (like the new output cache in aspnet).

We are going to propose another interface in the future that covers asynchronous caches scenarios and more.

I'm interested in this, will follow!

Enumeration We decided not to implement enumeration on RCache:

Good call!

A couple of final things.

This looks good for caches of homogeneous values, but for heterogeneous values? Would we just use RCache<TKey, object> and then cast the result or is there something else to look out for?

Capacity: what happens when the Capacity is reached? I think I've missed the explanation.

Regarding logging: is there a plan to support logging or is it way too overkill at such a low level? I can imagine the latter, but just asking.

Ok, I think it's all.

Thanks again for sharing this with us!

jodydonetti avatar Dec 03 '23 17:12 jodydonetti

Thanks for your feedback. :-)

when this will be finalized and released (I suppose in the .NET 9 timeframe)

If this is going to end up in dotnet/extensions then we have 1 month release cadence for this repository.

Following this distinction I see this new RCache as part of the 1st category, and in this sense it feels good to me: the scope is quite narrow, no extra high-level features and the attention to micro-benchmarking is quite high. FWIW, good 👍

I agree with this classification and how you position RCache there. Application level cache IMHO must have much richer semantics to support all those features and it affects overall performance. Lower level cache can be also used as a building block for creating app level caches.

Maybe something that communicate the low-level/no-frills nature of it like RawCache or similar?

I am not very opinionated on the name. I like RCache since it is short and quite unique. If it will lay in appropriate namespace and will be well documented I don't see an object name to be point of confusion. Probably whatever name we choose there will be somebody that don't like it, so I will leave this decision to you.

Understood, but to do this it seems you are not cleaning up automatically: not having the code to look at I'm wondering if a SET call of key "foo" will also cleanup the expired entry for cache key "bar". If this is not the case you may run into memory remaining allocated by cache entries that ended up not being touched for a long time, so this is something to look out for. A different approach may be to include timed background cleanup but as optional, with something like a bool flag in the general options or a TimeSpan? (null being the default, so no auto-cleanup). This way by default no leaks will be guaranteed but, if wanted, the ease of it can be obtained. I think the related code should not be that much.

You are correct with the first three sentences. We may leave uncleared memory when cache is long untouched. This is the reason our interface has RemoveExpiredEntries method, so that if you know this might be your case you can implement automatic clearup easily. We can also create some helpers that do it for you. LRU implementation that we have now is not using background threads but the LFU implementation that we will come up with next will do. One of the core ideas of this proposal is - this is not just one implementation. This is the first one.

The feature is nice, but watch out for this (been there): I would enable automatic Dispose calls on eviction only as an opt-in feature (like bool EnableAutoDisposeOnEvict), since sometimes I saw usages in the wild where something is get from the cache and kept used only to become randomly disposed while still being used, for this exact behavior.

Thats great experience. I will add this to proposal. Thanks a lot!

factory methods: why 2 of them instead of only one with name + options and options being nullable? If tomorrow a new param param3 would arrive, that may generate new permutations (name + param3, name + options + param3, etc). Nothing major, just wondering

I expect 'Options' parameter to absorb new features instead of adding new permutations to factory method. This is just a preference decision, since nullable is semantically equivalent to what we have now. I think that interfaces that require passing null explicitly are just longer to write and not always clear to interpret.

named caches: thinking about the DI support, I assume calling CreateLru<TKey, TValue>("foo") multiple times will return the same instance right? What about calling it twice with the same name but different options? Or with the same name but with different types? Is the name unique per-types-combination or global?

  • RCache factory is a factory, so creating the cache with the same name twice will return different instance. (Not using DI)
  • Using `AddLruRCache' in ServiceCollection with the same name multiple times will configure one instance of RCache that is unique in DI per name and types. We are following .NET options pattern, so each configuration will apply in sequnce.
  • As stated above, name + types is a differentiatior. Cache with the same name and different types will have its own instance.

So to summary, DI is global per name and key value combination. Creating 'RCache' by hand gives you no protection.

options: RCacheLruOptions<TKey, TValue> why the generic typing here? Currently there does not seem to be nothing justying that, and as it is it would avoid sharing the same options with different instances. Maybe is it to make it future proof? Admittedly the cost here is nothing, so again just wondering

  • By having RCacheLruOptions follow the same generic parameters as RCache itself, you can have shorter signature for creating a cache using factory since types will be inferred.
  • This is future proof if we want any TValue specific option in options we don't need to do a breaking change. (Examples: calculating TValue size func in options)

ExtendedTimeToEvictAfterHit vs ExtendTimeToEvictAfterHit: I suggest to use a Is/Enable/etc prefix for bool options. In this case specifically the 2 options looks quite identical, so by using something like EnableExtendedTimeToEvictAfterHit it would make it clearer which one is which

Noted, will think about clearer naming.

methods overloads: I see different overloads with/without optional params (like 4 different GetOrSets). I understand the need but down the line it may be quite daunting to maintain/evolve with this approach

Could you give an example when maintaing/evolving it is daunting?

factory signature: you are using the specific params currently used, I can imagine to avoid bloat. While I feel that, I also can see a problem in the future when new factory params/return values may be needed, and then you would need to add more overloads and handle them all and the permutations will become quite a lot (been there...)

If I understand correctly you think that the RCache factory may be returning some wrap over TValue and TTL and accept different parameters than TKey and TState?

I think that having TState allow you to pass any parameter type, and returning TValue to return any type of value. Could you share your experience with the issues you had with permutations?

the bool TryGet(out ...) signature will not be usable in case one day you'd like to add async support, so you'll eventually end up with 2 different methods with 2 very different signatures for the sync and async versions. I'd suggest to start immediately with a signature that will be able to support async. In my project I have used a simple MaybeValue<T> struct, kinda like a Nullable<T> but for both ref/value type (like a maybe/option monad). It works well, is not heavy on resources and is easily evolvable into the async territory

RCache will never have async support. This is explicit design decision and proposal is not supposed to evolve in this direction.

Also, now that I'm thinking about it: if no stampede prevention is provided, and no async support is provided (see later), why having a GetOrSet at all?

As explained above by Martin, stampade can be easily fixed by returning Lazy<T> from a cache. For big services collision rate is small, so even when cost of creation is high you don't need this protection.

Understood, but as said by providing a GetOrSet method with a factory passed to it, I think users will expect both of these:

Concurrent dictionary does not support async and does not prevent stampedes but still supports GetOrSet. Why users would expect different behavior from RCache than ConcurrentDictionary?

How would a TryAdd method solve the stampede for async operations?

I've made a mistake, it won't. I've removed this part from proposal.

This looks good for caches of homogeneous values, but for heterogeneous values? Would we just use RCache<TKey, object> and then cast the result or is there something else to look out for?

This is one possible solution.

Capacity: what happens when the Capacity is reached? I think I've missed the explanation.

When capacity reaches nothing happens. When an item is added to a cache with full capacity, cache will evict one entry in approx Lru order.

Regarding logging: is there a plan to support logging or is it way too overkill at such a low level? I can imagine the latter, but just asking.

I think it would be an overkill but we can add optional logging for diagnostic purposes that would be turned off by default. The question is what the log lines would be about? If the case is to log TKey and TValue there comes a problem of not capturing any sensitive data.

rafal-mz avatar Dec 04 '23 11:12 rafal-mz

Small remark. I realize that we aim primarily for the highly perf, low allocation scenarios (like ConcurrentDictionary). That is a good call. I might use it for many of my ConcurrentDictionaries.

I am a big fan of the TValue GetOrSet<TState>(TKey key, TState state, Func<TKey, TState, TValue> factory); overload, which allows to save allocation of TState closure for the factory that might need it.

I just hope that it will not be implemented as is in the BitFaster caching repo like this V GetOrAdd<TArg>(K key, Func<K, TArg, V> valueFactory, TArg factoryArgument) => this.GetOrAdd(key, k => valueFactory(k, factoryArgument));

Where this line indeed creates a closure of the valueFactory and factoryArgument. Sorry if that is obvious, just wanted to make sure.

Richard004 avatar Dec 04 '23 15:12 Richard004

No, its not implemented like that. This overload was specifically added to avoid closure allocation.

rafal-mz avatar Dec 04 '23 16:12 rafal-mz

If this is going to end up in dotnet/extensions then we have 1 month release cadence for this repository.

Ah, probably even better.

I agree with this classification and how you position RCache there. Application level cache IMHO must have much richer semantics to support all those features and it affects overall performance. Lower level cache can be also used as a building block for creating app level caches.

Exactly, that's the idea!

Probably whatever name we choose there will be somebody that don't like it

Yep, most definitely 🤷

so I will leave this decision to you.

Ahah thanks, but part I'm not a Microsoft employee :-)

Anyway, another idea that just came up: MemoryCacheSlim. It's a known and common naming pattern in .NET and already have ManualResetEvent and ManualResetEventSlim, Semaphore and SemaphoreSlim, etc and each "slim" variant is kinda known to be the "lightweight version with somewhat less features" of the "full" version. On the other hand maybe in this case the 2 designs (MemoryCache and this new one) would be quite different so this convention may not apply. I don't know, just spitballing here at the moment for everybody to think about it.

You are correct with the first three sentences. We may leave uncleared memory when cache is long untouched. This is the reason our interface has RemoveExpiredEntries method, so that if you know this might be your case you can implement automatic clearup easily. We can also create some helpers that do it for you. LRU implementation that we have now is not using background threads but the LFU implementation that we will come up with next will do.

Understood, and I like having the ability to not having stuff running in the background and being able to fully control manual calls to RemoveExpiredEntries manually. Having said that, I still think the support to do it automatically in an opt-in way with the implementation already provided would be something people would like, and the code may be small in theory. But of course I may be wrong here, and without looking at the code I'll wait and see.

Btw I later noticed that the abstract class already implements IDisposable, so I was wondering the rationale here: without a timer there would not be a potential timer leak, got it, but what is IDisposable here for then? There mey still be leaks of something else?

One of the core ideas of this proposal is - this is not just one implementation. This is the first one.

Totally got it, and I really like it!

Thats great experience. I will add this to proposal. Thanks a lot!

You're welcome!

I expect 'Options' parameter to absorb new features instead of adding new permutations to factory method.

Makes sense. I still have a doubt about what can actually go into options, when we also consider the IOptions<T> pattern and the way it can naturally be configured via the general config system (so JSON deserialization and stuff), and I have the same concern in FusionCache, too. Some time ago I was wondering about this with both Davids (Fowler and Pine) in an email thread: should sub-components like an ITimeProvider or similar (so not just plain values) be put directly into an options object or on separate ctor params, but I never got to a definitive answer about it. Do you see an hypothetical ITimeProvider or others non-plain-values into the options class itself?

This is just a preference decision, since nullable is semantically equivalent to what we have now. I think that interfaces that require passing null explicitly are just longer to write and not always clear to interpret.

Agree, but I usually also declare the nullable params with a default value of null, to get the best of both worlds. Anyway it's more of a stylistic preference probably, so I get the different vibes.

  • RCache factory is a factory, so creating the cache with the same name twice will return different instance. (Not using DI)
  • Using `AddLruRCache' in ServiceCollection with the same name multiple times will configure one instance of RCache that is unique in DI per name and types. We are following .NET options pattern, so each configuration will apply in sequnce.
  • As stated above, name + types is a differentiatior. Cache with the same name and different types will have its own instance.

Yeah sorry, after posting my comment I got to it, sorry for the time wasted.

So to summary, DI is global per name and key value combination. Creating 'RCache' by hand gives you no protection.

Makes sense, thanks.

  • This is future proof if we want any TValue specific option in options we don't need to do a breaking change. (Examples: calculating TValue size func in options)

Right, makes sense.

Could you give an example when maintaing/evolving it is daunting?

Eh, (un)luckily yes 😅: see here, here and here. They are the same class (splitted in 3 partial files for organization purposes), and they represent the permutation of all the optional params and signatures. I started with some, and then added support for more params, so I had to add all the possible combinations of params to keep the thing aligned with the general approach. Maybe this will not apply here, and maybe it's not so bad (I had to do it once), but I preferred to warn you guys about this, maybe you can take some inspiration on what to potentially avoid.

If I understand correctly you think that the RCache factory may be returning some wrap over TValue and TTL and accept different parameters than TKey and TState? I think that having TState allow you to pass any parameter type, and returning TValue to return any type of value. Could you share your experience with the issues you had with permutations?

For the return types of the factory lambda, adding new overloads with new lambda signatures is a pita, whereas adding a new prop to an existing return type is much easier and will not explode into a lot of variations. On the other hand I can understand the potential extra allocations even when not needed, but still is something to look out for.

As an exercise I suggest to try to keep the current design, and see what would needs to change in the future by adding an extra return type, not just TValue or (TValue, TimeSpan), but also the Hotness thing you mentioned, let's say to let consumers configure how important something is (it's just an example). How would that work down the line?

RCache will never have async support. This is explicit design decision and proposal is not supposed to evolve in this direction.

Ok, I was referring to this passage "We are going to propose another interface in the future that covers asynchronous caches scenarios and more" and thinking about consistency overall.

As explained above by Martin, stampade can be easily fixed by returning Lazy from a cache.

Yes, as in ConcurrentDictionary, but then on the consumer side I saw a lot of people wanting to avoid always having to do dict[key].Value etc and ending up creating their own LazyConcurrentDictioanry or whatever to shield them from having to directly work with Lazy<T> every time. Also, they need to understand the LazyThreadSafetyMode. Just my two cents.

For big services collision rate is small, so even when cost of creation is high you don't need this protection. Mmmh, this sounds strange: I had members of the FusionCache community telling me that just by swapping MemoryCache with FusionCache they saw somewhere around 50% less calls to the database. Of course this wholly depends on the access patterns involved, so it may very well depends on a lot of factors. And anyway you surely have way more field data than me 😅 so I trust you guys here.

Concurrent dictionary does not support async and does not prevent stampedes but still supports GetOrSet. Why users would expect different behavior from RCache than ConcurrentDictionary?

I'm going out on a limb here maybe, but 2 things come to mind:

  1. it's not that they expect have different expectations, it's that in my experience most of the times they expect protection in the dictionary too, and are biten by it down the line when they discover that they are not. Of course there's the whole RTFM and stuff, but I'm just pointing this out after having seen it multiple times
  2. the thing is called cache stampede, and this is a cache, so expectations may differ

This is one possible solution.

Got it, thanks!

When capacity reaches nothing happens. When an item is added to a cache with full capacity, cache will evict one entry in approx Lru order.

Got it.

I think it would be an overkill but we can add optional logging for diagnostic purposes that would be turned off by default.

As it is today, logging is probably overkill, I agree. I was just thinking in perspective. Anyway if it may be useful to you and the team, have a read here as I've been there in the past, maybe it can be of help with public expectations. In the end the standard way of having to configure the min log level (via namespace prefix etc) ended up being the right call, imho, but I'm still wondering if a global (meaning per-instance) bool flag would be a nice addition, to avoid inadvertently enabling logging for the cache, and for every cache instance. Since here we are driving for max perf, a nice little bool flag (default to false) to explicitly enable it may be a nice addition, on top of receiving an ILogger instance whichwould happen automatically via DI.

If the case is to log TKey and TValue there comes a problem of not capturing any sensitive data.

Good call, haven't thought about it.

The question is what the log lines would be about?

In FusionCache there can be a lot of stuff going on: async calls, automatic background factory completion, distribuetd cache + backplane communications, eager refresh and so on so logging becomes quite fundamental for when my users go into detective mode (and for my own sanity too, when developing new features 🤣). But here, in such a low-level + hi-perf case, probably it's not worth it, you're right.

Anyway... woah, what a good discussion to have, thanks again, I really appreciate it!

jodydonetti avatar Dec 04 '23 21:12 jodydonetti

Ahah thanks, but part I'm not a Microsoft employee :-)

I've meant to leave this decision to the community, reviewers. For me its just a name.

Btw I later noticed that the abstract class already implements IDisposable, so I was wondering the rationale here: without a timer there would not be a potential timer leak, got it, but what is IDisposable here for then? There mey still be leaks of something else?

If the cache holds TValues implementing IDisposable, it will dispose each of those at its own disposal. The class is also holding System.Diagnostics.Metrics.Meter, which is IDisposable.

As an exercise I suggest to try to keep the current design, and see what would needs to change in the future by adding an extra return type, not just TValue or (TValue, TimeSpan), but also the Hotness thing you mentioned, let's say to let consumers configure how important something is (it's just an example). How would that work down the line?

The presented model is general and cannot have any assumptions about underlying algorithm. If we would return 'hotness' or 'priority' it would mean that all of the implementations must have a notion of prioritization. IMHO to specify such details in a correct way one needs to have deep understanding of the implementation details, algorithm used, data access patterns and excesssive measurements. So, I would call it a leaky abstraction.

There are some algorithm details which may be useful for tuning the cache for high performance. Such details (if exposed) will be part of the options object for specific implementation. In this case it would be RCacheLruOptions<TK, TV>

Do you see an hypothetical ITimeProvider or others non-plain-values into the options class itself?

I believe that anything like ILogger, TimeProvider should go into ctor and be taken from DI. First, it allows to configure cross cutting concerns once and the whole system 'pick them up' automatically. Second, it makes the contract implicit, so that it is less work for library developer to maintain the interface. Third (this is just my impression not backed by data), I think that configuration model is not that well understood in the community and folks often build service provider multiple times to pass some class instance to options.

Funny enough, I think in case of time provider and RCache it will be taken from options as instance. The reason for that is we have public factory that don't know about DI container.

I saw a lot of people wanting to avoid always having to do dict[key].Value etc and ending up creating their own LazyConcurrentDictioanry or whatever to shield them from having to directly work with Lazy<T> every time. Also, they need to understand the LazyThreadSafetyMode.

I think (again as Martin already mentioned) we want people to pay for what they use. I don't like the idea of slowing down everybody by default to shield less experienced folks from shooting their foot. I would love to do both at the same time. :-)

the thing is called cache stampede, and this is a cache, so expectations may differ

Thats fair. I will think about it.

rafal-mz avatar Dec 04 '23 21:12 rafal-mz

Given that this is a being pitched as a low level collection, and builds upon the concurrent dictionary, had any thought been put into placing it in System.Collections.Concurrent and calling it ConcurrentExpiringDictionary?

I know this loses the cache in the name name but maybe that is no bad thing because it differentiates with the existing caches and makes clear this is primarily designed to be a low level collection and not an application level cache.

Also it has the bonus that it's a more descriptive name that fits in more closely with the rest of the naming style in the framework.

alastairtree avatar Dec 05 '23 00:12 alastairtree

I am the original author of ConcurrentLru on which RCache is based – so I can give some context on the design goals of the underlying algorithm and its performance characteristics. It’s also worth noting that ConcurrentLru is already used in many Microsoft production services serving tens of billions of requests per day - it’s battle tested and was born out of a practical need to replace MemoryCache to improve performance and stability.

The high-level design goals were along the lines of:

  • Optimize for throughput, in particular the cache hit ‘hot path’. In the services I work on, it is common to get very high frequency bursts of traffic accessing the same key. Making this lock free and fast under load gives us outsized gains.
  • Optimize hit rate, mitigate scanning. It’s common for downstream services to send us requests to check if there is work to do. Such services tend to for-loop over all the instances in the world waking them up and loading caches. This leads to ‘one hit’ items evicting hot items from MemoryCache, reducing hit rate and/or triggering memory exhaustion.
  • Mitigate the ‘memory thrashing’ failure mode. MemoryCache has a tendency to evict items that are in use causing them to be loaded again in a cycle. If the items are large/complex, this can quickly cause memory pressure/excessive GC. Once the CPU is pegged and there is a high influx of new items, the eviction thread may not run promptly causing a cascading failure as the machine runs out of memory and dies. I have seen multiple variants of this in production systems.
  • Pay for play. MemoryCache imposes memory and latency costs for all features even when they are not used. In a large scale service memory costs money and reducing the per item overhead makes a difference for large caches. We aggressively shrink everything because smaller heaps lead to faster GC and lower COGs.
  • Generic keys. It’s very common for us to use Guids or pairs of Guids to identify things, and boxing allocations on lookups should be avoided.

Not an exhaustive list – just to give an idea the focus was on having a strict bounded size cache that keeps the most useful items, suitable for use in the hot path of a high scale web application.

The bounded size part is important, cache management is amortized within cache writes by performing constant time operations without locks, and unlike MemoryCache it will not exceed capacity under load and doesn’t require a background thread. Instead, foreground threads cooperate to keep the cache within bounds using a variant of the 2Q algorithm (similar to that used in Memcached and OpenBsd). Implementing this algorithm efficiently without a background maintenance thread is to my knowledge novel.

A great deal of effort and tuning has been put into maximizing throughput and hit rate, and the approach is well suited to modern web servers. There are tradeoffs however, and there are a couple of caveats worth mentioning:

  • Since the algorithm is based on 3 concurrent queues, on machines with higher CPU count the queues can become a bottleneck when many threads contend on the head/tail of each queue. This will manifest as high CPU consumption due to spin waits inside ConcurrentQueue.Enqueue/Dequeue. That might sound scary, but in practice this can only really occur when there are constant very high frequency cache misses, and the cache factory does almost no computation. Cache factories do computational work even when throwing an exception and consequently it is hard to provoke this in the wild. For desktop/container use, this is a very unlikely failure mode. For a monolithic web app that runs on a 64 CPU VM more care is needed.
  • When used with time-based expiry, the cache will tend to hold onto items. If there are no writes, nothing will be evicted. This can be mitigated by periodic expiry on a background thread. On demand expiry is an O(n) operation since we depend on FIFO queues. The queues are optimized for concurrency, so it’s a tradeoff where we chose to favor throughput with strict bounded size.
  • It is not a pure LRU, and items are not evicted in LRU order. The cache can be in either a cold or warm state depending on queue fullness and will exhibit different patterns of eviction in each state. This is not intuitive and can cause confusion/alarm when replaying simple sequences of lookups and observing the results. The relative sizes of the internal queues have been carefully tuned to optimize hit rate from real world production traffic.

bitfaster avatar Dec 05 '23 20:12 bitfaster

Small remark. I realize that we aim primarily for the highly perf, low allocation scenarios (like ConcurrentDictionary). That is a good call. I might use it for many of my ConcurrentDictionaries.

I am a big fan of the TValue GetOrSet<TState>(TKey key, TState state, Func<TKey, TState, TValue> factory); overload, which allows to save allocation of TState closure for the factory that might need it.

I just hope that it will not be implemented as is in the BitFaster caching repo like this V GetOrAdd<TArg>(K key, Func<K, TArg, V> valueFactory, TArg factoryArgument) => this.GetOrAdd(key, k => valueFactory(k, factoryArgument));

Where this line indeed creates a closure of the valueFactory and factoryArgument. Sorry if that is obvious, just wanted to make sure.

@Richard004 Not related to this discussion but worth clarifying: the quoted code above is the default implementation of the interface method that exists for backwards compatibility (in other words if you had implemented ICache yourself from an earlier version of my library, you will not have this new method and use this default as a fallback). The actual implementation on ConcurrentLru of course does not allocate a closure.

bitfaster avatar Dec 05 '23 20:12 bitfaster

From my own experience hacking at it, Bitfaster is a very good caching library, however if you need a portion of the functionality it is better to specialize the implementation you can get even more from it.

I am not familiar with it but was contributing to "Bitfaster" considered as an alternative? (Can we cc their owners here?)

@danmoseley I of course would welcome any contributions and would like to see my library help folks to build better software. Related to the point above, I work hard to preserve backwards compatibility because I work on services spread across many repos and breaking changes are painful to absorb.

So I think having a solid option built into the framework is valuable. Anyone who had to update Newtonsoft.Json in a complex multi-repo service will understand why a solid framework option benefits the ecosystem and can be a huge time saver.

bitfaster avatar Dec 05 '23 21:12 bitfaster

The only thing this is missing for it to be a drop-in replacement for the LRU in Orleans is a bool Remove(KeyValuePair<K, V>) API, which ConcurrentDictionary<K,V> implements & BitFaster.Caching uses internally in several spots. We use this to remove known-invalid entries under concurrency without clobbering the entry if it's been updated.

ReubenBond avatar Dec 07 '23 17:12 ReubenBond

Why public abstract bool Remove(TKey key); API isn't sufficient?

rafal-mz avatar Dec 07 '23 17:12 rafal-mz

Sometimes you need to be sure you're removing the key only when the value matches what you already have, either from a previous lookup or cached somewhere. This is important when multiple of these replacing operations might concurrently run.

NinoFloris avatar Dec 07 '23 17:12 NinoFloris

Why public abstract bool Remove(TKey key); API isn't sufficient?

Because of the last part: "under concurrency without clobbering the entry if it's been updated." If an entry has already been replaced by a valid entry, we don't want the Remove operation to evict it since that would require us to recreate and populate the entry again. Recreating the entry is a remote call which is performed outside of the cache itself, serialized via a worker so that all calls for a given entry go to the same worker to prevent a flurry of duplicate work for a given hot key.

ReubenBond avatar Dec 07 '23 17:12 ReubenBond