FASTER Discussion: IDistributedCache - provision and questions

For context, I'm on the devdiv / aspnet team and am currently probably the main owner of the caching story.

IDistributedCache is (at least in .NET 8) the main caching backend abstraction, with local-memory, Redis, SQLite, SQL Server, Cosmos, etc backends. I propose that FASTER should have such, and have prototyped it here. The good news is: it seems to work great, and the performance is 🔥

IDistributedCache is getting some love in .NET 9, so my eyes are on this area.

I have some thoughts from this, though:

should MSFT provide an official implementation? if so, where should it live? I'm happy for you to take mine
request for eyeballs; am I doing anything wrong/incomplete? in particular, things like "session length", and also: since I implement expiration, do I need to explicitly remove something, or is returning false from the functions layer enough to cause that to happen?
the serialization layer feels "old school"; there may be a lot of room for improvement there, to improve perf and reduce allocations and memory-copies; happy to have that discussion at length, but increasingly serializers are supporting IBufferWriter<byte> and ReadOnlySequence<byte> implementations; I'd love to have a conversation about that
also, stateless serializers rather than serializer factories

Let me know if there is interest from this from the FASTER side; if not, I can try to do my best from the "other side"

Mar 24 '24 11:03 mgravell

FWIW the current impl is already working well!

nb: I just needed to make a minor change, for which I made a PR.

Mar 24 '24 15:03 jodydonetti

Thanks @mgravell - this is a great start! With a starting point like this, it should be possible to look closely and optimize it much further. I already see opportunities for improvement of at least 2X if not more. Garnet skyrocketing is keeping us a bit busy right now, but we will check it out in the next week or two, and perhaps integrate into our NuGets (ofc, only if you think that makes sense). cc @TedHartMS

Mar 24 '24 21:03 badrishc

I'll compare it tomorrow against my own impl. If I remember correctly, the preferred method for session Handling is to cache and reuse the sessions instead of new() every time.

Mar 24 '24 21:03 Tornhoof

Yes, that is one important optimization. The other is to use VarLenBlittableAllocator if the keys and values are blittable -- instead of paying the cost of object serializer at all.

Mar 25 '24 02:03 badrishc

My own impl is not for a DistributedCache, but for a larger than memory local cache, using blittable keys and values (I write fixed size keys and var length binary serializable blobs only). I don't have sliding expiration, just a size check, which then cuts of segments after a certain size. So I have only one log device and not two. My clientsession type alias looks like:


using ClientSession  = ClientSession<SpanByte, SpanByte, SpanByte, SpanByteAndMemory, Empty, SpanByteFunctions<Empty>;

That being said, these are the two important optimizations, as badrish already wrote, cached sessions and blittable keys/values.

Cache sessions

I used a ConcurrentBag, I'm aware that ConcurrentQueue and/or ObjectPool might be better, depending on the congestion of the data structure, but none of my benchmarks ever showed any conclusive difference.

Other Tidbits

I recommend adding an option to the options type to do DeleteOnClose (arg for CreateLogDevice), a lot of usages probably do not need to persist the cache on restart of the application.
My LogSettings are more complex and have extra ReadCacheSettings, according to https://microsoft.github.io/FASTER/docs/fasterkv-tuning/ it's useful for larger than memory caches, I followed that page for the other logsettings
My impl implements IEnumerable<...>, mostly for testing (the IDistributedCache interface does not include any enumeration support afaik)

Notes about SpanByte

If you go the SpanByte route (which is the blittable route) and want to prevent memory copies, the complexity rises quite fast, as you need to do the async/await dance without async methods. This is required as you need to use fixed to create SpanBytes from your keys/values. This means you need to split into sync and async paths via ValueTask.IsCompletedSuccessfully.

At the end such a split method looks like this (for ReadAsync), it easily is 3 or 4 times as long and complex as doing it without the async/await dance and a lot more error prone, but Marc is probably more aware of async/await shenanigans than most of us, so if anyone would be safe doing that stuff, it is him.

This is only the ReadAsync implementation and related methods/types, I can't post the full implementation without asking legal (and not getting an answer in a timely fashion anyway).


public ValueTask<TValue?> ReadAsync<TKey, TValue>(TKey key, CancellationToken cancellationToken = default) where TKey : unmanaged where TValue : struct,  IByteWritable<TValue>
{
    var session = GetSession();
    return ReadAsyncInternal<TKey, TValue>(key, session, true, cancellationToken);
}

private unsafe ValueTask<TValue?> ReadAsyncInternal<TKey, TValue>(TKey key,
    ClientSession session, bool releaseSession, CancellationToken cancellationToken = default)
    where TKey : unmanaged where TValue : struct,  IByteWritable<TValue>
{
    try
    {
        Span<byte> keySpan = MemoryMarshal.AsBytes(new Span<TKey>(ref key));
        fixed (byte* _ = keySpan) // the fixed span needs only be fixed until the ReadAsync returns, for async completions it is copied internally
        {
            SpanByte keySpanByte = SpanByte.FromFixedSpan(keySpan);
            var task = session.ReadAsync(ref keySpanByte, token: cancellationToken);
            if (task.IsCompletedSuccessfully)
            {
                var (status, memory) = task.Result.Complete();
                var result = HandleReadStatus<TValue>(status, releaseSession, session, memory);
                return new ValueTask<TValue?>(result);
            }

            return CompleteReadAsync<TValue>(task, releaseSession, session);
        }
    }
    catch
    {
        session.Dispose();
        throw;
    }
}

private TValue? HandleReadStatus<TValue>(Status status, bool releaseSession, ClientSession session, SpanByteAndMemory memory) where TValue : struct,  IByteWritable<TValue>
{
    TValue? result = default;
    if (status.Found) // if found, we deserialize it
    {
        result = Deserialize<TValue>(memory);
    }
    else if (!status.NotFound) // everything except not found is an error
    {
        ThrowInvalidOperation(status);
    }

    // for single ops we release the session here, for multi ops we wait until they are all finished
    if (releaseSession)
    {
        ReleaseSession(session);
    }

    return result;
}

private async ValueTask<TValue?> CompleteReadAsync<TValue>(ValueTask<FasterKV<SpanByte, SpanByte>.ReadAsyncResult<SpanByte, SpanByteAndMemory, Empty>> valueTask, bool releaseSession,
    ClientSession session) where TValue : struct,  IByteWritable<TValue>
{
    try
    {
        var readAsyncResult = await valueTask.ConfigureAwait(false);
        var (status, memory) = readAsyncResult.Complete();
        return HandleReadStatus<TValue>(status, releaseSession, session, memory);
    }
    catch
    {
        session.Dispose();
        throw;
    }
}

private static TValue Deserialize<TValue>(SpanByteAndMemory spanByteAndMemory) where TValue : struct, IByteWritable<TValue>
{
    IMemoryOwner<byte> memory;
    int length;
    if (spanByteAndMemory.IsSpanByte)
    {
        // ReSharper disable once PossiblyImpureMethodCallOnReadonlyVariable
        (memory, length) = spanByteAndMemory.SpanByte.ToMemoryOwner(MemoryPool<byte>.Shared);
    }
    else
    {
        memory = spanByteAndMemory.Memory;
        length = spanByteAndMemory.Length;
    }

    return TValue.Deserialize(memory, length);
}


public interface IByteWritable<TSelf> where TSelf : struct
{
    /// <summary>
    /// Deserialize the content from memory, the memory is now owned by this instance, dispose when finished
    /// </summary>
    public static abstract TSelf Deserialize(IMemoryOwner<byte> memory, int length);

    /// <summary>
    /// Make sure TSelf is readonly struct, output is written data, if the data does not fit into the existing output
    /// buffer is populated and needs to be disposed by the caller
    /// </summary>
    public static abstract void Serialize(in TSelf self, ref Span<byte> output, out IMemoryOwner<byte>? buffer, out int written);
}

My tests indicated, that in-memory operations completed synchronously and disk-based operations asynchronously, so your tests need to be run with enough data, otherwise you might not hit the async code path.

Mar 25 '24 07:03 Tornhoof

Lots to digest when at PC! Happy to have ignited some discussion.

Delete on close: agree useful option, but probably not the default; the other key scenario here is "I want the cache to survive web-server restarts, but it doesn't need to be centralised"

Sessions: great tips - will try to look

Blittable: ah, makes sense; I'll see what I can do - I was naively+blindly copying from the cache sample :)

Optimized async fast path: I am very much familiar (I do more than a little async IO inner-loop stuff - sockets, protocol handlers, etc).

Blittable spans - will look, although I wonder if there's a Memory-T trick we're missing there. I also need to contend with a small extra chunk of metadata for the header, so if ReadOnlySequence-byte could be processed as blittable (which it is, semantically, albeit non-contiguous), I could prepend metadata very cheaply (no buffer copy for the main payload). At worst case, I can do a lease+linearize, if FASTER will accept an array segment as a span. Note also that I'm .NET 9 I hope to add a secondary distributed cache backend API that takes ReadOnlySequence as the "set" payload type, to carry this "no right-sized single-use arrays" metaphor end-to-end. And IBufferWriter for the output payload mechanism.

Things that I think might be missing:

checkpoint / persistence (I know very little about FASTER specifically :p)
key purge, either on spotting miss when reading, or periodic random sampling / keyspace walk

In serialization: it sounds like it isn't critical for this scenario (thanks blittable), but I do genuinely think that you could provide a much more direct (and efficient) additional serialization API for scenarios that do need it. I'd be very happy to offer input and suggestions there (I'm very deep in general .NET serialization and caching territory - I just don't know much about the internals of FASTER)

On Mon, 25 Mar 2024, 07:38 Stefan, @.***> wrote:

My own impl is not for a DistributedCache but for a larger then memory local cache, using blittable keys and values (I write fixed size keys and var length binary serializable blobs only). I don't have sliding expiration, just a size check, which then cuts of segments after a certain size. So I have only one log device and not two. My clientsession type alias looks like:

using ClientSession = ClientSession<SpanByte, SpanByte, SpanByte, SpanByteAndMemory, Empty, SpanByteFunctions<Empty>;

That being said, these are the two important optimizations, as badrish already wrote, cached sessions and blittable keys/values. Cache sessions

I used a ConcurrentBag, I'm aware that ConcurrentQueue and/or ObjectPool might be better, depending on the congestion of the data structure, but none of my benchmarks ever showed any conclusive difference. Other Tidbits

I recommend adding an option to the options type to do DeleteOnClose (arg for CreateLogDevice), a lot of usages probably do not need to persist the cache on restart of the application.

My LogSettings are more complex and have extra ReadCacheSettings, according to https://microsoft.github.io/FASTER/docs/fasterkv-tuning/ it's useful for larger than memory caches, I followed that page for the other logsettings

My impl implements IEnumerable<...>, mostly for testing (the IDistributedCache interface does not include any enumeration support afaik)

Notes about SpanByte

If you go the SpanByte route (which is the blittable route) and want to prevent memory copies, the complexity rises quite fast, as you need to do the async/await dance without async methods. This is required as you need to use fixed to create SpanBytes from your keys/values. This means you need to split into sync and async paths via ValueTask.IsCompletedSuccessfully.

At the end such a split method looks like this (for ReadAsync), it easily is 3 or 4 times as long and complex as doing it without the async/await dance and a lot more error prone, but Marc is probably more aware of async/await shenanigans than most of us, so if anyone would be safe doing that stuff, it is him.

This is only the ReadAsync implementation and related methods/types, I can't post the full implementation without asking legal (and not getting an answer in a timely fashion anyway).

public ValueTask<TValue?> ReadAsync<TKey, TValue>(TKey key, CancellationToken cancellationToken = default) where TKey : unmanaged where TValue : struct, IByteWritable<TValue>{ var session = GetSession(); return ReadAsyncInternal<TKey, TValue>(key, session, true, cancellationToken);} private unsafe ValueTask<TValue?> ReadAsyncInternal<TKey, TValue>(TKey key, ClientSession session, bool releaseSession, CancellationToken cancellationToken = default) where TKey : unmanaged where TValue : struct, IByteWritable<TValue>{ try { Span keySpan = MemoryMarshal.AsBytes(new Span<TKey>(ref key)); fixed (byte* _ = keySpan) // the fixed span needs only be fixed until the ReadAsync returns, for async completions it is copied internally { SpanByte keySpanByte = SpanByte.FromFixedSpan(keySpan); var task = session.ReadAsync(ref keySpanByte, token: cancellationToken); if (task.IsCompletedSuccessfully) { var (status, memory) = task.Result.Complete(); var result = HandleReadStatus<TValue>(status, releaseSession, session, memory); return new ValueTask<TValue?>(result); }
        return CompleteReadAsync<TValue>(task, releaseSession, session);
    }
}
catch
{
    session.Dispose();
    throw;
}}
private TValue? HandleReadStatus<TValue>(Status status, bool releaseSession, ClientSession session, SpanByteAndMemory memory) where TValue : struct, IByteWritable<TValue>{ TValue? result = default; if (status.Found) // if found, we deserialize it { result = Deserialize<TValue>(memory); } else if (!status.NotFound) // everything except not found is an error { ThrowInvalidOperation(status); }
// for single ops we release the session here, for multi ops we wait until they are all finished
if (releaseSession)
{
    ReleaseSession(session);
}

return result;}
private async ValueTask<TValue?> CompleteReadAsync<TValue>(ValueTask<FasterKV<SpanByte, SpanByte>.ReadAsyncResult<SpanByte, SpanByteAndMemory, Empty>> valueTask, bool releaseSession, ClientSession session) where TValue : struct, IByteWritable<TValue>{ try { var readAsyncResult = await valueTask.ConfigureAwait(false); var (status, memory) = readAsyncResult.Complete(); return HandleReadStatus<TValue>(status, releaseSession, session, memory); } catch { session.Dispose(); throw; }} private static TValue Deserialize<TValue>(SpanByteAndMemory spanByteAndMemory) where TValue : struct, IByteWritable<TValue>{ IMemoryOwner memory; int length; if (spanByteAndMemory.IsSpanByte) { // ReSharper disable once PossiblyImpureMethodCallOnReadonlyVariable (memory, length) = spanByteAndMemory.SpanByte.ToMemoryOwner(MemoryPool.Shared); } else { memory = spanByteAndMemory.Memory; length = spanByteAndMemory.Length; }
return TValue.Deserialize(memory, length);}
public interface IByteWritable<TSelf> where TSelf : struct{ ///
/// Deserialize the content from memory, the memory is now owned by this instance, dispose when finished ///
public static abstract TSelf Deserialize(IMemoryOwner memory, int length);
/// <summary>
/// Make sure TSelf is readonly struct, output is written data, if the data does not fit into the existing output
/// buffer is populated and needs to be disposed by the caller
/// </summary>
public static abstract void Serialize(in TSelf self, ref Span<byte> output, out IMemoryOwner<byte>? buffer, out int written);}
My tests indicated, that in-memory operations completed synchronously and disk-based operations asynchronously, so your tests need to be run with enough data, otherwise you might not hit the async code path.

— Reply to this email directly, view it on GitHub https://github.com/microsoft/FASTER/issues/902#issuecomment-2017391257 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMAITGHIYBSCW533ORDYZ7IAJBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEYTIMZSGUZTAMRQQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGIYDIMZQGY4DSN5HORZGSZ3HMVZKMY3SMVQXIZI . You are receiving this email because you authored the thread.

Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

Mar 25 '24 07:03 mgravell

log options: done

SpanByte - I have an unsuccessful branch here; I'm using SpanByte keys and values using UTF8 and a buffer (stack or leased, depending on size); write says it works, but read says "not found" - would love any clues if anyone has a moment

specifically, see the BasicUsage test; debug output shows (currently, broken state)

Read: NotFound for the genuine "not there" fetch
Upsert: NotFound, CreatedRecord for the initial write
Read: NotFound for the failed second fetch when I expect a value

edit: it looks like the problem is that value is the right-size but all-zero in ConcurrentReader, where I'm checking the expiry

Mar 25 '24 12:03 mgravell

@mgravell Debugging through your code, in CacheFunctions the payload in the expiry check is empty. I don't know anything about CacheFunctions sorry, I used Empty everywhere.

Ah, I missed your edit.

Mar 25 '24 15:03 Tornhoof

I fixed it by switching to Memory<byte> instead of SpanByte; that's enough for me to make some progress; however... it wasn't the kind of progress I'd hoped for; we now get multiple allocations on the "get" path (I expect 10k here, for the final byte[] - the rest of the 36KB is... not ideal):

| Method          | KeyLength | PayloadLength | Mean        | Error       | StdDev      | Median      | Gen0   | Gen1   | Allocated |
|---------------- |---------- |-------------- |------------:|------------:|------------:|------------:|-------:|-------:|----------:|
| FASTER_Get      | 128       | 10240         |  2,786.6 ns |    55.11 ns |   130.98 ns |  2,763.4 ns | 2.1858 | 0.0992 |   36625 B |
| FASTER_Set      | 128       | 10240         |  2,132.4 ns |    36.65 ns |    34.28 ns |  2,136.4 ns | 0.5951 | 0.0153 |   10000 B |
| FASTER_GetAsync | 128       | 10240         |  2,860.2 ns |    57.15 ns |    98.58 ns |  2,877.0 ns | 2.1896 | 0.1984 |   36721 B |
| FASTER_SetAsync | 128       | 10240         |  2,240.5 ns |    43.76 ns |   119.06 ns |  2,187.1 ns | 0.5951 | 0.0191 |   10000 B |

I think to get this truly blazing with low allocs is going to take some input from the FASTER team - which is really the key goal here; see if we can get that momentum :)

update: got it down a bit using session reuse, but I have no clue what I'm doing with sessions, so this may not be valid; current state:

| Method          | KeyLength | PayloadLength | Mean        | Error       | StdDev      | Gen0   | Gen1   | Allocated |
|---------------- |---------- |-------------- |------------:|------------:|------------:|-------:|-------:|----------:|
| FASTER_Get      | 128       | 10240         |    974.2 ns |    16.52 ns |    15.45 ns | 1.5955 | 0.0496 |   26696 B |
| FASTER_Set      | 128       | 10240         |    569.3 ns |     5.22 ns |     4.63 ns | 0.0038 |      - |      72 B |
| FASTER_GetAsync | 128       | 10240         |  1,051.4 ns |    19.92 ns |    18.64 ns | 1.5984 | 0.0992 |   26792 B |
| FASTER_SetAsync | 128       | 10240         |    780.6 ns |     1.57 ns |     1.31 ns | 0.0038 |      - |      72 B |
| SQLite_Get      | 128       | 10240         | 12,137.0 ns |   200.33 ns |   187.39 ns | 0.6866 |      - |   11616 B |
| SQLite_Set      | 128       | 10240         | 76,377.2 ns | 1,482.33 ns | 1,522.25 ns |      - |      - |    1360 B |
| SQLite_GetAsync | 128       | 10240         | 22,139.6 ns |   297.40 ns |   278.19 ns | 0.7019 |      - |   11912 B |
| SQLite_SetAsync | 128       | 10240         | 75,067.2 ns | 1,035.19 ns |   864.43 ns |      - |      - |    1280 B |
| Rocks_Get       | 128       | 10240         |    123.8 ns |     0.79 ns |     0.70 ns | 0.0091 |      - |     152 B |
| Rocks_Set       | 128       | 10240         | 12,619.8 ns |    68.27 ns |    53.30 ns |      - |      - |     224 B |
| Rocks_GetAsync  | 128       | 10240         |    763.4 ns |    18.70 ns |    55.15 ns | 0.0257 |      - |     440 B |
| Rocks_SetAsync  | 128       | 10240         | 21,776.1 ns |   228.28 ns |   213.54 ns | 0.0305 |      - |     576 B |

(note: Rocks is cheating on the "get")

Mar 25 '24 15:03 mgravell

but I have no clue what I'm doing with sessions

You didn't push your code, atleast I couldn't find it, but if it looks similar to this, then you should be good, I missed that part in the details section of my earlier post.

private readonly ConcurrentBag<ClientSession> _sessions = new();
private ClientSession GetSession()
        {
            if (_sessions.TryTake(out var result))
            {
                return result;
            }

            var session = _store.For(_simpleFunctions).NewSession<SpanByteFunctions<Empty>>();
            return session;
        }

        private void ReleaseSession(ClientSession session)
        {
            _sessions.Add(session);
        }

I dispose a session after Exceptions, because I simply don't know the state of them.

Mar 25 '24 18:03 Tornhoof

I had basically the same - hadn't pushed. I think I've figured out the spanbyte life cycle too, locally - again, not pushed; hopefully tomorrow

On Mon, 25 Mar 2024, 18:17 Stefan, @.***> wrote:

but I have no clue what I'm doing with sessions

You didn't push your code, atleast I couldn't find it, but if it looks similar to this, then you should be good, I missed that part in the details section of my earlier post.

private readonly ConcurrentBag<ClientSession> _sessions = new();private ClientSession GetSession() { if (_sessions.TryTake(out var result)) { return result; }
        var session = _store.For(_simpleFunctions).NewSession<SpanByteFunctions<Empty>>();
        return session;
    }

    private void ReleaseSession(ClientSession session)
    {
        _sessions.Add(session);
    }
I dispose a session after Exceptions, because I simply don't know the state of them.

— Reply to this email directly, view it on GitHub https://github.com/microsoft/FASTER/issues/902#issuecomment-2018623948 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMEAX3RHIBO6FFCGU3DY2BS3HBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEYTIMZSGUZTAMRQQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGIYDIMZQGY4DSN5HORZGSZ3HMVZKMY3SMVQXIZI . You are receiving this email because you authored the thread.

Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

Mar 25 '24 19:03 mgravell

Here we go; definitely has legs!

I switched to SpanByte (after figuring out what that means internally) - that gets rid of the excess allocations, leaving just the final byte[]
I also implemented a mockup of the auxiliary API I want to add in .NET 9, which gets rid of that buffer allocation
and in my testing, I discovered that the Rocks cache is not actually working correctly - we can ignore the cheap "get" values, since they don't return the correct data ("you had one job")

Current benchmarks (edit immediately below same comment - table is now out of date):

| Method                | KeyLength | PayloadLength | Mean        | Error       | StdDev      | Gen0   | Gen1   | Allocated |
|---------------------- |---------- |-------------- |------------:|------------:|------------:|-------:|-------:|----------:|
| FASTER_Get            | 128       | 10240         |    552.3 ns |    21.16 ns |    18.76 ns | 0.6123 |      - |   10264 B |
| FASTER_Set            | 128       | 10240         |    826.7 ns |    15.10 ns |    10.91 ns | 0.6123 |      - |   10264 B |
| FASTER_GetBuffer      | 128       | 10240         |    315.5 ns |     5.65 ns |     4.09 ns |      - |      - |         - |
| FASTER_SetBuffer      | 128       | 10240         |    480.6 ns |     7.08 ns |     4.68 ns |      - |      - |         - |
| FASTER_GetAsync       | 128       | 10240         |    625.5 ns |    16.34 ns |    14.48 ns | 0.6189 |      - |   10360 B |
| FASTER_GetAsyncBuffer | 128       | 10240         |    367.6 ns |     6.24 ns |     2.77 ns | 0.0014 |      - |      24 B |
| FASTER_SetAsync       | 128       | 10240         |    903.8 ns |    15.46 ns |    11.18 ns | 0.6123 |      - |   10264 B |
| FASTER_SetAsyncBuffer | 128       | 10240         |    585.0 ns |    10.59 ns |     3.78 ns |      - |      - |         - |
| SQLite_Get            | 128       | 10240         | 12,717.9 ns |   205.08 ns |    73.13 ns | 0.6866 |      - |   11616 B |
| SQLite_Set            | 128       | 10240         | 75,747.7 ns | 1,300.16 ns |   773.70 ns | 0.6104 |      - |   11576 B |
| SQLite_GetAsync       | 128       | 10240         | 21,364.4 ns |   403.96 ns |   179.36 ns | 0.7019 |      - |   11912 B |
| SQLite_SetAsync       | 128       | 10240         | 76,450.3 ns | 1,478.02 ns | 1,153.94 ns | 0.6104 |      - |   11496 B |
| Rocks_Get             | 128       | 10240         |    843.3 ns |    17.26 ns |    15.30 ns | 0.6218 |      - |   10416 B |
| Rocks_Set             | 128       | 10240         | 14,182.6 ns |   247.34 ns |   178.84 ns | 0.6104 |      - |   10416 B |
| Rocks_GetAsync        | 128       | 10240         |  1,876.0 ns |    88.41 ns |    82.69 ns | 0.7935 | 0.0267 |   10704 B |
| Rocks_SetAsync        | 128       | 10240         | 24,844.1 ns | 1,422.47 ns | 1,330.58 ns | 0.6409 |      - |   10768 B |

Outstanding things I could do with help from people with:

persistence - not sure anything is writing to disk currently; we should fix that
unconstrained growth; is the memory / log data self-managing?
expiration removal
- in the observed case (is it enough to return false to indicate an expire? does that actually remove anything?)
- in the unobserved case (some kind of crawl - LRU, chaos, LFU, whatever)
probably a bunch of things that I don't know that I don't know

But the numbers look nice, especially for the .NET 9 scenario (FASTER_GetAsyncBuffer). I can also possibly improve the "pinning" on the async path; right now I use a simple using (memory.Pin()) - however, we could do a fixed pin with a speculative "does this finish synchronously", and only take the non-stack pin if we need to transition to full async

/cc @jodydonetti @badrishc

update: now with the deferred pinning on the async get paths:

| Method                | KeyLength | PayloadLength | Mean        | Error       | StdDev      | Gen0   | Gen1   | Allocated |
|---------------------- |---------- |-------------- |------------:|------------:|------------:|-------:|-------:|----------:|
| FASTER_Get            | 128       | 10240         |    642.5 ns |    37.46 ns |    33.20 ns | 0.6123 |      - |   10264 B |
| FASTER_Set            | 128       | 10240         |  1,036.3 ns |    22.11 ns |    19.60 ns | 0.6123 |      - |   10264 B |
| FASTER_GetBuffer      | 128       | 10240         |    388.2 ns |    11.98 ns |    10.62 ns | 0.0019 |      - |      32 B |
| FASTER_SetBuffer      | 128       | 10240         |    601.8 ns |    16.66 ns |    15.59 ns |      - |      - |         - |
| FASTER_GetAsync       | 128       | 10240         |    684.4 ns |    41.90 ns |    34.99 ns | 0.6189 |      - |   10360 B |
| FASTER_GetAsyncBuffer | 128       | 10240         |    393.3 ns |     7.84 ns |     6.95 ns | 0.0033 |      - |      56 B |
| FASTER_SetAsync       | 128       | 10240         |  1,056.5 ns |    34.24 ns |    32.03 ns | 0.6123 |      - |   10264 B |
| FASTER_SetAsyncBuffer | 128       | 10240         |    663.0 ns |    16.03 ns |    14.21 ns |      - |      - |         - |
| SQLite_Get            | 128       | 10240         | 13,180.0 ns |   277.07 ns |   259.17 ns | 0.6866 |      - |   11616 B |
| SQLite_Set            | 128       | 10240         | 74,546.1 ns | 2,079.11 ns | 1,944.80 ns | 0.6104 |      - |   11576 B |
| SQLite_GetAsync       | 128       | 10240         | 27,643.9 ns |   429.91 ns |   190.88 ns | 0.7019 |      - |   11912 B |
| SQLite_SetAsync       | 128       | 10240         | 81,441.6 ns | 1,887.54 ns | 1,673.26 ns | 0.6104 |      - |   11496 B |
| Rocks_Get             | 128       | 10240         |    940.4 ns |    36.74 ns |    34.37 ns | 0.6218 |      - |   10416 B |
| Rocks_Set             | 128       | 10240         | 14,473.9 ns |   282.96 ns |   100.91 ns | 0.6104 |      - |   10416 B |
| Rocks_GetAsync        | 128       | 10240         |  1,887.8 ns |   108.86 ns |   101.82 ns | 0.7973 | 0.0267 |   10704 B |
| Rocks_SetAsync        | 128       | 10240         | 25,086.6 ns |   964.64 ns |   902.33 ns | 0.6409 |      - |   10768 B |

Mar 26 '24 10:03 mgravell

This is really great, this evening I'll try to get the latest version from the repo and see if it works fine with FusionCache in the Simulator (I guess it will be flawless, like it was already before).

Will update, thanks!

Mar 26 '24 14:03 jodydonetti

@jodydonetti I've also thrown it onto NuGet, mostly to stop squatters, but: it is up-to-date

Mar 26 '24 14:03 mgravell

persistence - not sure anything is writing to disk currently; we should fix that

My dumb Test approach was simple, write a lot of data, look for the file on disk, then truncate the Log and wait for It to disappear

Mar 26 '24 15:03 Tornhoof

however, we could do a fixed pin with a speculative "does this finish synchronously", and only take the non-stack pin if we need to transition to full async

I ported my valuetask.iscompletedsuccessfully flow to your repo (only for getasync) and it's about 30-40ns faster than the original code, so less than 10%. That's, as far as I know the most sync it will ever be, for in-memory. So any transition code will be probably slower.

https://github.com/Tornhoof/FASTERCache/commit/136dbd160193503bdeb20f8819a7a09f2398f552

Mar 26 '24 20:03 Tornhoof

@Tornhoof I got the increase to 20% with full pin defer and async transition ;p

Mar 27 '24 12:03 mgravell

@jodydonetti I've also thrown it onto NuGet, mostly to stop squatters, but: it is up-to-date

Updated (v0.1.21), switched to using the new builder and tested again: everything seems to work well, even when mixing auto-recovery, soft/hard timeouts and other features!

Mar 27 '24 20:03 jodydonetti

the serialization layer feels "old school"; there may be a lot of room for improvement there, to improve perf and reduce allocations and memory-copies; happy to have that discussion at length, but increasingly serializers are supporting IBufferWriter<byte> and ReadOnlySequence<byte> implementations; I'd love to have a conversation about that

also, stateless serializers rather than serializer factories

+1 for redesigning the the serialization layer. I was about to do that in Garnet (and Tsavorite) but I luckily noticed this issue before doing anything. Anything but streams and I'd be very happy to advocate for copying the design over to Tsavorite and Garnet😄

Mar 29 '24 09:03 PaulusParssinen

Related to that "anything but streams" piece: https://github.com/dotnet/runtime/issues/100434 - i.e. "and here's how you fallback for legacy serializers"

Mar 29 '24 09:03 mgravell

FASTER FASTER copied to clipboard

Discussion: IDistributedCache - provision and questions

Cache sessions

Other Tidbits

Notes about SpanByte

FASTER
FASTER copied to clipboard