FASTER
FASTER copied to clipboard
Discussion: IDistributedCache - provision and questions
For context, I'm on the devdiv / aspnet team and am currently probably the main owner of the caching story.
IDistributedCache
is (at least in .NET 8) the main caching backend abstraction, with local-memory, Redis, SQLite, SQL Server, Cosmos, etc backends. I propose that FASTER should have such, and have prototyped it here. The good news is: it seems to work great, and the performance is 🔥
IDistributedCache
is getting some love in .NET 9, so my eyes are on this area.
I have some thoughts from this, though:
- should MSFT provide an official implementation? if so, where should it live? I'm happy for you to take mine
- request for eyeballs; am I doing anything wrong/incomplete? in particular, things like "session length", and also: since I implement expiration, do I need to explicitly remove something, or is returning
false
from the functions layer enough to cause that to happen? - the serialization layer feels "old school"; there may be a lot of room for improvement there, to improve perf and reduce allocations and memory-copies; happy to have that discussion at length, but increasingly serializers are supporting
IBufferWriter<byte>
andReadOnlySequence<byte>
implementations; I'd love to have a conversation about that - also, stateless serializers rather than serializer factories
Let me know if there is interest from this from the FASTER side; if not, I can try to do my best from the "other side"
FWIW the current impl is already working well!
nb: I just needed to make a minor change, for which I made a PR.
Thanks @mgravell - this is a great start! With a starting point like this, it should be possible to look closely and optimize it much further. I already see opportunities for improvement of at least 2X if not more. Garnet skyrocketing is keeping us a bit busy right now, but we will check it out in the next week or two, and perhaps integrate into our NuGets (ofc, only if you think that makes sense). cc @TedHartMS
I'll compare it tomorrow against my own impl. If I remember correctly, the preferred method for session Handling is to cache and reuse the sessions instead of new() every time.
Yes, that is one important optimization. The other is to use VarLenBlittableAllocator if the keys and values are blittable -- instead of paying the cost of object serializer at all.
My own impl is not for a DistributedCache, but for a larger than memory local cache, using blittable keys and values (I write fixed size keys and var length binary serializable blobs only). I don't have sliding expiration, just a size check, which then cuts of segments after a certain size. So I have only one log device and not two. My clientsession type alias looks like:
using ClientSession = ClientSession<SpanByte, SpanByte, SpanByte, SpanByteAndMemory, Empty, SpanByteFunctions<Empty>;
That being said, these are the two important optimizations, as badrish already wrote, cached sessions and blittable keys/values.
Cache sessions
I used a ConcurrentBag, I'm aware that ConcurrentQueue and/or ObjectPool might be better, depending on the congestion of the data structure, but none of my benchmarks ever showed any conclusive difference.
Other Tidbits
- I recommend adding an option to the options type to do DeleteOnClose (arg for CreateLogDevice), a lot of usages probably do not need to persist the cache on restart of the application.
- My LogSettings are more complex and have extra ReadCacheSettings, according to https://microsoft.github.io/FASTER/docs/fasterkv-tuning/ it's useful for larger than memory caches, I followed that page for the other logsettings
- My impl implements IEnumerable<...>, mostly for testing (the IDistributedCache interface does not include any enumeration support afaik)
Notes about SpanByte
If you go the SpanByte route (which is the blittable route) and want to prevent memory copies, the complexity rises quite fast, as you need to do the async/await dance without async methods. This is required as you need to use fixed to create SpanBytes from your keys/values.
This means you need to split into sync and async paths via ValueTask.IsCompletedSuccessfully
.
At the end such a split method looks like this (for ReadAsync), it easily is 3 or 4 times as long and complex as doing it without the async/await dance and a lot more error prone, but Marc is probably more aware of async/await shenanigans than most of us, so if anyone would be safe doing that stuff, it is him.
This is only the ReadAsync implementation and related methods/types, I can't post the full implementation without asking legal (and not getting an answer in a timely fashion anyway).
public ValueTask<TValue?> ReadAsync<TKey, TValue>(TKey key, CancellationToken cancellationToken = default) where TKey : unmanaged where TValue : struct, IByteWritable<TValue>
{
var session = GetSession();
return ReadAsyncInternal<TKey, TValue>(key, session, true, cancellationToken);
}
private unsafe ValueTask<TValue?> ReadAsyncInternal<TKey, TValue>(TKey key,
ClientSession session, bool releaseSession, CancellationToken cancellationToken = default)
where TKey : unmanaged where TValue : struct, IByteWritable<TValue>
{
try
{
Span<byte> keySpan = MemoryMarshal.AsBytes(new Span<TKey>(ref key));
fixed (byte* _ = keySpan) // the fixed span needs only be fixed until the ReadAsync returns, for async completions it is copied internally
{
SpanByte keySpanByte = SpanByte.FromFixedSpan(keySpan);
var task = session.ReadAsync(ref keySpanByte, token: cancellationToken);
if (task.IsCompletedSuccessfully)
{
var (status, memory) = task.Result.Complete();
var result = HandleReadStatus<TValue>(status, releaseSession, session, memory);
return new ValueTask<TValue?>(result);
}
return CompleteReadAsync<TValue>(task, releaseSession, session);
}
}
catch
{
session.Dispose();
throw;
}
}
private TValue? HandleReadStatus<TValue>(Status status, bool releaseSession, ClientSession session, SpanByteAndMemory memory) where TValue : struct, IByteWritable<TValue>
{
TValue? result = default;
if (status.Found) // if found, we deserialize it
{
result = Deserialize<TValue>(memory);
}
else if (!status.NotFound) // everything except not found is an error
{
ThrowInvalidOperation(status);
}
// for single ops we release the session here, for multi ops we wait until they are all finished
if (releaseSession)
{
ReleaseSession(session);
}
return result;
}
private async ValueTask<TValue?> CompleteReadAsync<TValue>(ValueTask<FasterKV<SpanByte, SpanByte>.ReadAsyncResult<SpanByte, SpanByteAndMemory, Empty>> valueTask, bool releaseSession,
ClientSession session) where TValue : struct, IByteWritable<TValue>
{
try
{
var readAsyncResult = await valueTask.ConfigureAwait(false);
var (status, memory) = readAsyncResult.Complete();
return HandleReadStatus<TValue>(status, releaseSession, session, memory);
}
catch
{
session.Dispose();
throw;
}
}
private static TValue Deserialize<TValue>(SpanByteAndMemory spanByteAndMemory) where TValue : struct, IByteWritable<TValue>
{
IMemoryOwner<byte> memory;
int length;
if (spanByteAndMemory.IsSpanByte)
{
// ReSharper disable once PossiblyImpureMethodCallOnReadonlyVariable
(memory, length) = spanByteAndMemory.SpanByte.ToMemoryOwner(MemoryPool<byte>.Shared);
}
else
{
memory = spanByteAndMemory.Memory;
length = spanByteAndMemory.Length;
}
return TValue.Deserialize(memory, length);
}
public interface IByteWritable<TSelf> where TSelf : struct
{
/// <summary>
/// Deserialize the content from memory, the memory is now owned by this instance, dispose when finished
/// </summary>
public static abstract TSelf Deserialize(IMemoryOwner<byte> memory, int length);
/// <summary>
/// Make sure TSelf is readonly struct, output is written data, if the data does not fit into the existing output
/// buffer is populated and needs to be disposed by the caller
/// </summary>
public static abstract void Serialize(in TSelf self, ref Span<byte> output, out IMemoryOwner<byte>? buffer, out int written);
}
My tests indicated, that in-memory operations completed synchronously and disk-based operations asynchronously, so your tests need to be run with enough data, otherwise you might not hit the async code path.
Lots to digest when at PC! Happy to have ignited some discussion.
Delete on close: agree useful option, but probably not the default; the other key scenario here is "I want the cache to survive web-server restarts, but it doesn't need to be centralised"
Sessions: great tips - will try to look
Blittable: ah, makes sense; I'll see what I can do - I was naively+blindly copying from the cache sample :)
Optimized async fast path: I am very much familiar (I do more than a little async IO inner-loop stuff - sockets, protocol handlers, etc).
Blittable spans - will look, although I wonder if there's a Memory-T trick
we're missing there. I also need to contend with a small extra chunk of
metadata for the header, so if ReadOnlySequence-byte could be processed as
blittable (which it is, semantically, albeit non-contiguous), I could
prepend metadata very cheaply (no buffer copy for the main payload). At
worst case, I can do a lease+linearize, if FASTER will accept an array
segment as a span. Note also that I'm .NET 9 I hope to add a secondary
distributed cache backend API that takes ReadOnlySequence
Things that I think might be missing:
- checkpoint / persistence (I know very little about FASTER specifically :p)
- key purge, either on spotting miss when reading, or periodic random sampling / keyspace walk
In serialization: it sounds like it isn't critical for this scenario (thanks blittable), but I do genuinely think that you could provide a much more direct (and efficient) additional serialization API for scenarios that do need it. I'd be very happy to offer input and suggestions there (I'm very deep in general .NET serialization and caching territory - I just don't know much about the internals of FASTER)
On Mon, 25 Mar 2024, 07:38 Stefan, @.***> wrote:
My own impl is not for a DistributedCache but for a larger then memory local cache, using blittable keys and values (I write fixed size keys and var length binary serializable blobs only). I don't have sliding expiration, just a size check, which then cuts of segments after a certain size. So I have only one log device and not two. My clientsession type alias looks like:
using ClientSession = ClientSession<SpanByte, SpanByte, SpanByte, SpanByteAndMemory, Empty, SpanByteFunctions<Empty>;
That being said, these are the two important optimizations, as badrish already wrote, cached sessions and blittable keys/values. Cache sessions
I used a ConcurrentBag, I'm aware that ConcurrentQueue and/or ObjectPool might be better, depending on the congestion of the data structure, but none of my benchmarks ever showed any conclusive difference. Other Tidbits
- I recommend adding an option to the options type to do DeleteOnClose (arg for CreateLogDevice), a lot of usages probably do not need to persist the cache on restart of the application.
- My LogSettings are more complex and have extra ReadCacheSettings, according to https://microsoft.github.io/FASTER/docs/fasterkv-tuning/ it's useful for larger than memory caches, I followed that page for the other logsettings
- My impl implements IEnumerable<...>, mostly for testing (the IDistributedCache interface does not include any enumeration support afaik)
Notes about SpanByte
If you go the SpanByte route (which is the blittable route) and want to prevent memory copies, the complexity rises quite fast, as you need to do the async/await dance without async methods. This is required as you need to use fixed to create SpanBytes from your keys/values. This means you need to split into sync and async paths via ValueTask.IsCompletedSuccessfully.
At the end such a split method looks like this (for ReadAsync), it easily is 3 or 4 times as long and complex as doing it without the async/await dance and a lot more error prone, but Marc is probably more aware of async/await shenanigans than most of us, so if anyone would be safe doing that stuff, it is him.
This is only the ReadAsync implementation and related methods/types, I can't post the full implementation without asking legal (and not getting an answer in a timely fashion anyway).
public ValueTask<TValue?> ReadAsync<TKey, TValue>(TKey key, CancellationToken cancellationToken = default) where TKey : unmanaged where TValue : struct, IByteWritable<TValue>{ var session = GetSession(); return ReadAsyncInternal<TKey, TValue>(key, session, true, cancellationToken);} private unsafe ValueTask<TValue?> ReadAsyncInternal<TKey, TValue>(TKey key, ClientSession session, bool releaseSession, CancellationToken cancellationToken = default) where TKey : unmanaged where TValue : struct, IByteWritable<TValue>{ try { Span
keySpan = MemoryMarshal.AsBytes(new Span<TKey>(ref key)); fixed (byte* _ = keySpan) // the fixed span needs only be fixed until the ReadAsync returns, for async completions it is copied internally { SpanByte keySpanByte = SpanByte.FromFixedSpan(keySpan); var task = session.ReadAsync(ref keySpanByte, token: cancellationToken); if (task.IsCompletedSuccessfully) { var (status, memory) = task.Result.Complete(); var result = HandleReadStatus<TValue>(status, releaseSession, session, memory); return new ValueTask<TValue?>(result); } return CompleteReadAsync<TValue>(task, releaseSession, session); } } catch { session.Dispose(); throw; }}
private TValue? HandleReadStatus<TValue>(Status status, bool releaseSession, ClientSession session, SpanByteAndMemory memory) where TValue : struct, IByteWritable<TValue>{ TValue? result = default; if (status.Found) // if found, we deserialize it { result = Deserialize<TValue>(memory); } else if (!status.NotFound) // everything except not found is an error { ThrowInvalidOperation(status); }
// for single ops we release the session here, for multi ops we wait until they are all finished if (releaseSession) { ReleaseSession(session); } return result;}
private async ValueTask<TValue?> CompleteReadAsync<TValue>(ValueTask<FasterKV<SpanByte, SpanByte>.ReadAsyncResult<SpanByte, SpanByteAndMemory, Empty>> valueTask, bool releaseSession, ClientSession session) where TValue : struct, IByteWritable<TValue>{ try { var readAsyncResult = await valueTask.ConfigureAwait(false); var (status, memory) = readAsyncResult.Complete(); return HandleReadStatus<TValue>(status, releaseSession, session, memory); } catch { session.Dispose(); throw; }} private static TValue Deserialize<TValue>(SpanByteAndMemory spanByteAndMemory) where TValue : struct, IByteWritable<TValue>{ IMemoryOwner
memory; int length; if (spanByteAndMemory.IsSpanByte) { // ReSharper disable once PossiblyImpureMethodCallOnReadonlyVariable (memory, length) = spanByteAndMemory.SpanByte.ToMemoryOwner(MemoryPool .Shared); } else { memory = spanByteAndMemory.Memory; length = spanByteAndMemory.Length; } return TValue.Deserialize(memory, length);}
public interface IByteWritable<TSelf> where TSelf : struct{ ///
/// Deserialize the content from memory, the memory is now owned by this instance, dispose when finished /// public static abstract TSelf Deserialize(IMemoryOwnermemory, int length); /// <summary> /// Make sure TSelf is readonly struct, output is written data, if the data does not fit into the existing output /// buffer is populated and needs to be disposed by the caller /// </summary> public static abstract void Serialize(in TSelf self, ref Span<byte> output, out IMemoryOwner<byte>? buffer, out int written);}
My tests indicated, that in-memory operations completed synchronously and disk-based operations asynchronously, so your tests need to be run with enough data, otherwise you might not hit the async code path.
— Reply to this email directly, view it on GitHub https://github.com/microsoft/FASTER/issues/902#issuecomment-2017391257 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMAITGHIYBSCW533ORDYZ7IAJBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEYTIMZSGUZTAMRQQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGIYDIMZQGY4DSN5HORZGSZ3HMVZKMY3SMVQXIZI . You are receiving this email because you authored the thread.
Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .
log options: done
SpanByte
- I have an unsuccessful branch here; I'm using SpanByte
keys and values using UTF8 and a buffer (stack or leased, depending on size); write says it works, but read says "not found" - would love any clues if anyone has a moment
specifically, see the BasicUsage
test; debug output shows (currently, broken state)
-
Read: NotFound
for the genuine "not there" fetch -
Upsert: NotFound, CreatedRecord
for the initial write -
Read: NotFound
for the failed second fetch when I expect a value
edit: it looks like the problem is that value
is the right-size but all-zero in ConcurrentReader
, where I'm checking the expiry
@mgravell Debugging through your code, in CacheFunctions
the payload in the expiry check is empty.
I don't know anything about CacheFunctions sorry, I used Empty everywhere.
Ah, I missed your edit.
I fixed it by switching to Memory<byte>
instead of SpanByte
; that's enough for me to make some progress; however... it wasn't the kind of progress I'd hoped for; we now get multiple allocations on the "get" path (I expect 10k here, for the final byte[]
- the rest of the 36KB is... not ideal):
| Method | KeyLength | PayloadLength | Mean | Error | StdDev | Median | Gen0 | Gen1 | Allocated |
|---------------- |---------- |-------------- |------------:|------------:|------------:|------------:|-------:|-------:|----------:|
| FASTER_Get | 128 | 10240 | 2,786.6 ns | 55.11 ns | 130.98 ns | 2,763.4 ns | 2.1858 | 0.0992 | 36625 B |
| FASTER_Set | 128 | 10240 | 2,132.4 ns | 36.65 ns | 34.28 ns | 2,136.4 ns | 0.5951 | 0.0153 | 10000 B |
| FASTER_GetAsync | 128 | 10240 | 2,860.2 ns | 57.15 ns | 98.58 ns | 2,877.0 ns | 2.1896 | 0.1984 | 36721 B |
| FASTER_SetAsync | 128 | 10240 | 2,240.5 ns | 43.76 ns | 119.06 ns | 2,187.1 ns | 0.5951 | 0.0191 | 10000 B |
I think to get this truly blazing with low allocs is going to take some input from the FASTER team - which is really the key goal here; see if we can get that momentum :)
update: got it down a bit using session reuse, but I have no clue what I'm doing with sessions, so this may not be valid; current state:
| Method | KeyLength | PayloadLength | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
|---------------- |---------- |-------------- |------------:|------------:|------------:|-------:|-------:|----------:|
| FASTER_Get | 128 | 10240 | 974.2 ns | 16.52 ns | 15.45 ns | 1.5955 | 0.0496 | 26696 B |
| FASTER_Set | 128 | 10240 | 569.3 ns | 5.22 ns | 4.63 ns | 0.0038 | - | 72 B |
| FASTER_GetAsync | 128 | 10240 | 1,051.4 ns | 19.92 ns | 18.64 ns | 1.5984 | 0.0992 | 26792 B |
| FASTER_SetAsync | 128 | 10240 | 780.6 ns | 1.57 ns | 1.31 ns | 0.0038 | - | 72 B |
| SQLite_Get | 128 | 10240 | 12,137.0 ns | 200.33 ns | 187.39 ns | 0.6866 | - | 11616 B |
| SQLite_Set | 128 | 10240 | 76,377.2 ns | 1,482.33 ns | 1,522.25 ns | - | - | 1360 B |
| SQLite_GetAsync | 128 | 10240 | 22,139.6 ns | 297.40 ns | 278.19 ns | 0.7019 | - | 11912 B |
| SQLite_SetAsync | 128 | 10240 | 75,067.2 ns | 1,035.19 ns | 864.43 ns | - | - | 1280 B |
| Rocks_Get | 128 | 10240 | 123.8 ns | 0.79 ns | 0.70 ns | 0.0091 | - | 152 B |
| Rocks_Set | 128 | 10240 | 12,619.8 ns | 68.27 ns | 53.30 ns | - | - | 224 B |
| Rocks_GetAsync | 128 | 10240 | 763.4 ns | 18.70 ns | 55.15 ns | 0.0257 | - | 440 B |
| Rocks_SetAsync | 128 | 10240 | 21,776.1 ns | 228.28 ns | 213.54 ns | 0.0305 | - | 576 B |
(note: Rocks is cheating on the "get")
but I have no clue what I'm doing with sessions
You didn't push your code, atleast I couldn't find it, but if it looks similar to this, then you should be good, I missed that part in the details section of my earlier post.
private readonly ConcurrentBag<ClientSession> _sessions = new();
private ClientSession GetSession()
{
if (_sessions.TryTake(out var result))
{
return result;
}
var session = _store.For(_simpleFunctions).NewSession<SpanByteFunctions<Empty>>();
return session;
}
private void ReleaseSession(ClientSession session)
{
_sessions.Add(session);
}
I dispose a session after Exceptions, because I simply don't know the state of them.
I had basically the same - hadn't pushed. I think I've figured out the spanbyte life cycle too, locally - again, not pushed; hopefully tomorrow
On Mon, 25 Mar 2024, 18:17 Stefan, @.***> wrote:
but I have no clue what I'm doing with sessions
You didn't push your code, atleast I couldn't find it, but if it looks similar to this, then you should be good, I missed that part in the details section of my earlier post.
private readonly ConcurrentBag<ClientSession> _sessions = new();private ClientSession GetSession() { if (_sessions.TryTake(out var result)) { return result; }
var session = _store.For(_simpleFunctions).NewSession<SpanByteFunctions<Empty>>(); return session; } private void ReleaseSession(ClientSession session) { _sessions.Add(session); }
I dispose a session after Exceptions, because I simply don't know the state of them.
— Reply to this email directly, view it on GitHub https://github.com/microsoft/FASTER/issues/902#issuecomment-2018623948 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMEAX3RHIBO6FFCGU3DY2BS3HBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEYTIMZSGUZTAMRQQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGIYDIMZQGY4DSN5HORZGSZ3HMVZKMY3SMVQXIZI . You are receiving this email because you authored the thread.
Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .
Here we go; definitely has legs!
- I switched to
SpanByte
(after figuring out what that means internally) - that gets rid of the excess allocations, leaving just the finalbyte[]
- I also implemented a mockup of the auxiliary API I want to add in .NET 9, which gets rid of that buffer allocation
- and in my testing, I discovered that the Rocks cache is not actually working correctly - we can ignore the cheap "get" values, since they don't return the correct data ("you had one job")
Current benchmarks (edit immediately below same comment - table is now out of date):
| Method | KeyLength | PayloadLength | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
|---------------------- |---------- |-------------- |------------:|------------:|------------:|-------:|-------:|----------:|
| FASTER_Get | 128 | 10240 | 552.3 ns | 21.16 ns | 18.76 ns | 0.6123 | - | 10264 B |
| FASTER_Set | 128 | 10240 | 826.7 ns | 15.10 ns | 10.91 ns | 0.6123 | - | 10264 B |
| FASTER_GetBuffer | 128 | 10240 | 315.5 ns | 5.65 ns | 4.09 ns | - | - | - |
| FASTER_SetBuffer | 128 | 10240 | 480.6 ns | 7.08 ns | 4.68 ns | - | - | - |
| FASTER_GetAsync | 128 | 10240 | 625.5 ns | 16.34 ns | 14.48 ns | 0.6189 | - | 10360 B |
| FASTER_GetAsyncBuffer | 128 | 10240 | 367.6 ns | 6.24 ns | 2.77 ns | 0.0014 | - | 24 B |
| FASTER_SetAsync | 128 | 10240 | 903.8 ns | 15.46 ns | 11.18 ns | 0.6123 | - | 10264 B |
| FASTER_SetAsyncBuffer | 128 | 10240 | 585.0 ns | 10.59 ns | 3.78 ns | - | - | - |
| SQLite_Get | 128 | 10240 | 12,717.9 ns | 205.08 ns | 73.13 ns | 0.6866 | - | 11616 B |
| SQLite_Set | 128 | 10240 | 75,747.7 ns | 1,300.16 ns | 773.70 ns | 0.6104 | - | 11576 B |
| SQLite_GetAsync | 128 | 10240 | 21,364.4 ns | 403.96 ns | 179.36 ns | 0.7019 | - | 11912 B |
| SQLite_SetAsync | 128 | 10240 | 76,450.3 ns | 1,478.02 ns | 1,153.94 ns | 0.6104 | - | 11496 B |
| Rocks_Get | 128 | 10240 | 843.3 ns | 17.26 ns | 15.30 ns | 0.6218 | - | 10416 B |
| Rocks_Set | 128 | 10240 | 14,182.6 ns | 247.34 ns | 178.84 ns | 0.6104 | - | 10416 B |
| Rocks_GetAsync | 128 | 10240 | 1,876.0 ns | 88.41 ns | 82.69 ns | 0.7935 | 0.0267 | 10704 B |
| Rocks_SetAsync | 128 | 10240 | 24,844.1 ns | 1,422.47 ns | 1,330.58 ns | 0.6409 | - | 10768 B |
Outstanding things I could do with help from people with:
- persistence - not sure anything is writing to disk currently; we should fix that
- unconstrained growth; is the memory / log data self-managing?
- expiration removal
- in the observed case (is it enough to return
false
to indicate an expire? does that actually remove anything?) - in the unobserved case (some kind of crawl - LRU, chaos, LFU, whatever)
- in the observed case (is it enough to return
- probably a bunch of things that I don't know that I don't know
But the numbers look nice, especially for the .NET 9 scenario (FASTER_GetAsyncBuffer
). I can also possibly improve the "pinning" on the async path; right now I use a simple using (memory.Pin())
- however, we could do a fixed
pin with a speculative "does this finish synchronously", and only take the non-stack pin if we need to transition to full async
/cc @jodydonetti @badrishc
update: now with the deferred pinning on the async get paths:
| Method | KeyLength | PayloadLength | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
|---------------------- |---------- |-------------- |------------:|------------:|------------:|-------:|-------:|----------:|
| FASTER_Get | 128 | 10240 | 642.5 ns | 37.46 ns | 33.20 ns | 0.6123 | - | 10264 B |
| FASTER_Set | 128 | 10240 | 1,036.3 ns | 22.11 ns | 19.60 ns | 0.6123 | - | 10264 B |
| FASTER_GetBuffer | 128 | 10240 | 388.2 ns | 11.98 ns | 10.62 ns | 0.0019 | - | 32 B |
| FASTER_SetBuffer | 128 | 10240 | 601.8 ns | 16.66 ns | 15.59 ns | - | - | - |
| FASTER_GetAsync | 128 | 10240 | 684.4 ns | 41.90 ns | 34.99 ns | 0.6189 | - | 10360 B |
| FASTER_GetAsyncBuffer | 128 | 10240 | 393.3 ns | 7.84 ns | 6.95 ns | 0.0033 | - | 56 B |
| FASTER_SetAsync | 128 | 10240 | 1,056.5 ns | 34.24 ns | 32.03 ns | 0.6123 | - | 10264 B |
| FASTER_SetAsyncBuffer | 128 | 10240 | 663.0 ns | 16.03 ns | 14.21 ns | - | - | - |
| SQLite_Get | 128 | 10240 | 13,180.0 ns | 277.07 ns | 259.17 ns | 0.6866 | - | 11616 B |
| SQLite_Set | 128 | 10240 | 74,546.1 ns | 2,079.11 ns | 1,944.80 ns | 0.6104 | - | 11576 B |
| SQLite_GetAsync | 128 | 10240 | 27,643.9 ns | 429.91 ns | 190.88 ns | 0.7019 | - | 11912 B |
| SQLite_SetAsync | 128 | 10240 | 81,441.6 ns | 1,887.54 ns | 1,673.26 ns | 0.6104 | - | 11496 B |
| Rocks_Get | 128 | 10240 | 940.4 ns | 36.74 ns | 34.37 ns | 0.6218 | - | 10416 B |
| Rocks_Set | 128 | 10240 | 14,473.9 ns | 282.96 ns | 100.91 ns | 0.6104 | - | 10416 B |
| Rocks_GetAsync | 128 | 10240 | 1,887.8 ns | 108.86 ns | 101.82 ns | 0.7973 | 0.0267 | 10704 B |
| Rocks_SetAsync | 128 | 10240 | 25,086.6 ns | 964.64 ns | 902.33 ns | 0.6409 | - | 10768 B |
This is really great, this evening I'll try to get the latest version from the repo and see if it works fine with FusionCache in the Simulator (I guess it will be flawless, like it was already before).
Will update, thanks!
@jodydonetti I've also thrown it onto NuGet, mostly to stop squatters, but: it is up-to-date
persistence - not sure anything is writing to disk currently; we should fix that
My dumb Test approach was simple, write a lot of data, look for the file on disk, then truncate the Log and wait for It to disappear
however, we could do a fixed pin with a speculative "does this finish synchronously", and only take the non-stack pin if we need to transition to full async
I ported my valuetask.iscompletedsuccessfully flow to your repo (only for getasync) and it's about 30-40ns faster than the original code, so less than 10%. That's, as far as I know the most sync it will ever be, for in-memory. So any transition code will be probably slower.
https://github.com/Tornhoof/FASTERCache/commit/136dbd160193503bdeb20f8819a7a09f2398f552
@Tornhoof I got the increase to 20% with full pin defer and async transition ;p
@jodydonetti I've also thrown it onto NuGet, mostly to stop squatters, but: it is up-to-date
Updated (v0.1.21), switched to using the new builder and tested again: everything seems to work well, even when mixing auto-recovery, soft/hard timeouts and other features!
- the serialization layer feels "old school"; there may be a lot of room for improvement there, to improve perf and reduce allocations and memory-copies; happy to have that discussion at length, but increasingly serializers are supporting
IBufferWriter<byte>
andReadOnlySequence<byte>
implementations; I'd love to have a conversation about that- also, stateless serializers rather than serializer factories
+1 for redesigning the the serialization layer. I was about to do that in Garnet (and Tsavorite) but I luckily noticed this issue before doing anything. Anything but streams and I'd be very happy to advocate for copying the design over to Tsavorite and Garnet😄
Related to that "anything but streams" piece: https://github.com/dotnet/runtime/issues/100434 - i.e. "and here's how you fallback for legacy serializers"