FASTER use MemoryMarshal.GetReference to implement FromFixedSpan

Note:

makes the Span<byte> and ReadOnlySpan<byte> paths consistent
avoids fixed stack gynmastics overheads (minor but non-zero)
works correctly for zero-length spans (i.e. doesn't give IndexOutOfRangeException)

Mar 28 '24 17:03 mgravell

Need this in Tsavorite ...

Apr 02 '24 00:04 badrishc

@badrishc want me to repeat these over in Garnet/Tsavorite? I'll be honest: I didn't realise there were two different cores until this morning... (also: I learned what Tsavorite is, in the non-tech sense, so: yay?)

Apr 02 '24 14:04 mgravell

Good news: no more allocations (this is on the "always turns out to be synchronous" path)

Can I check that I've interpreted your meaning correctly?

Old version:

// non-async context
var pending = session.WhateverAsync(...);
if (!pending.IsCompletedSuccessfully) return Awaited(pending, ...) // async helper method
var result = pending.GetAwaiter().GetResult(); // actually sync
if (result.Status.IsPending) 
{
    (status, output) = result.Complete(); // don't expect to ever hit this
}

new version:

// non-async context
var status = session.Whatever(...); // note sync API
if (status.IsPending) return Awaited(...); // async helper method

where the new async helper method does

var outputs = await session.CompletePendingWithOutputsAsync(token: token);
// iterate exactly one output, etc

Is that what you meant?

Results below; "S" vs "NS" is "sliding" vs "non-sliding" (expiration behavior)

| Method                    | KeyLength | PayloadLength | Mean        | Error       | StdDev    | Gen0   | Gen1   | Allocated |
|-------------------------- |---------- |-------------- |------------:|------------:|----------:|-------:|-------:|----------:|
| FASTER_S_Get              | 128       | 10240         |    570.4 ns |    10.34 ns |   4.59 ns | 0.6123 |      - |   10264 B |
| FASTER_S_Set              | 128       | 10240         |    819.2 ns |    15.91 ns |  10.52 ns | 0.6123 |      - |   10264 B |
| FASTER_S_GetBuffer        | 128       | 10240         |    366.8 ns |     6.44 ns |   2.86 ns |      - |      - |         - |
| FASTER_S_GetInPlace       | 128       | 10240         |    289.2 ns |     5.20 ns |   1.86 ns |      - |      - |         - |
| FASTER_S_SetBuffer        | 128       | 10240         |  2,487.6 ns |    45.06 ns |  16.07 ns |      - |      - |         - |
| FASTER_S_GetAsync         | 128       | 10240         |    641.0 ns |    12.24 ns |   3.18 ns | 0.6180 |      - |   10336 B |
| FASTER_S_GetAsyncBuffer   | 128       | 10240         |    376.7 ns |     6.80 ns |   3.02 ns |      - |      - |         - |
| FASTER_S_GetAsyncInPlace  | 128       | 10240         |    304.8 ns |     5.52 ns |   2.89 ns |      - |      - |         - |
| FASTER_S_SetAsync         | 128       | 10240         |    771.3 ns |    12.52 ns |   5.56 ns | 0.6123 |      - |   10264 B |
| FASTER_S_SetAsyncBuffer   | 128       | 10240         |  2,493.8 ns |    36.99 ns |   9.61 ns |      - |      - |         - |
| FASTER_NS_Get             | 128       | 10240         |    511.2 ns |     9.82 ns |   7.67 ns | 0.6123 |      - |   10264 B |
| FASTER_NS_Set             | 128       | 10240         |    882.3 ns |    11.22 ns |   5.87 ns | 0.6123 |      - |   10264 B |
| FASTER_NS_GetBuffer       | 128       | 10240         |    358.1 ns |     6.66 ns |   1.73 ns |      - |      - |         - |
| FASTER_NS_GetInPlace      | 128       | 10240         |    293.1 ns |     2.16 ns |   0.33 ns |      - |      - |         - |
| FASTER_NS_SetBuffer       | 128       | 10240         |  2,496.0 ns |    28.33 ns |   4.38 ns |      - |      - |         - |
| FASTER_NS_GetAsync        | 128       | 10240         |    558.1 ns |    10.03 ns |   3.58 ns | 0.6180 |      - |   10336 B |
| FASTER_NS_GetAsyncBuffer  | 128       | 10240         |    329.3 ns |     5.30 ns |   1.38 ns |      - |      - |         - |
| FASTER_NS_GetAsyncInPlace | 128       | 10240         |    295.1 ns |     5.55 ns |   2.90 ns |      - |      - |         - |
| FASTER_NS_SetAsync        | 128       | 10240         |    852.6 ns |    13.75 ns |   7.19 ns | 0.6123 |      - |   10264 B |
| FASTER_NS_SetAsyncBuffer  | 128       | 10240         |  4,291.7 ns |    20.64 ns |   3.19 ns |      - |      - |         - |

Apr 02 '24 14:04 mgravell

Sorry, just saw this thread. Yes we world prefer to get these changes over to Garnet!

Apr 05 '24 13:04 badrishc