foundationdb-dotnet-client Replacing Slice with Span<byte>/Memory<byte>

Replacing Slice with Span<byte>/Memory<byte>

Open KrzysFR opened this issue 6 years ago • 3 comments

When the binding was written, there were no viable solution to work with binary keys: both byte[] or ArraySegment<byte> are to cumbersome to use, and MemoryStream is maybe fine for (large) values, but impractical for small keys.

The Slice struct was created to closely resemble the string API. The Tuple layer (which also predates support for ValueTuple<..> in .NET) was also added and could serialize tuples into binary keys efficiently.

There is now support for Span<T> and Memory<T> natively in .NET Core 2.1 (fast and optimized by the JIT) or via a the System.Memory NuGet package for .NET Framework (not optimized, slower).

The low-level native API that talks to the C API (via interop) should definitely use either Span<byte>/ReadOnlySpan<byte>, but the question is whether this should also be exposed to the Binding API:

Span<byte> does not have the full API that Slice currently offer. But this can be fixed either via extension methods, or by make Slice a wrapper struct on top of spans
Spans are stack-only, and cannot be stored on heap, and more importantly cannot be used in async methods. Memory<T> is used for this purpose, but the current implementation in .NET Framework (not JIT optimized) is a lot slower than Slice currently.

The top level API would probably need to accept Span and Memory anyway: there will be a lot more API in mscorlib/corefx that will natively accept spans, so the need for Slice will become less and less important.

Apr 20 '18 09:04 KrzysFR

This is a bit problematic because Span<T>/Memory<T> are not part of .NET Standard 2.0 and so very few classes in the BCL accept them, outside of .NET Core 2.x. There is talk to add it to .NET Standard 2.1, but the release date still unknown.

Another thing is that Spans still do not offer a complete enough API to be usable as the main type for keys/values of a key/value store. It seems to have been mostly used for buffering, parsing, chunking, pooling (where the perf impact is the biggest), but is a bit lacking for generating/parsing keys and values in a mostly-async context.

Current opinion: keep the Slice type as the main API type, but add overloads that accept ReadOnlySpan<T> as inputs, and think of a nice way to also accept Span as outputs (maybe using SpanAction<..>?) for very perf intensive cases? The biggest issue is to make sure that the lifetime of the span is well defined and never cross any await boundary.

As for the framework requirements for Span support:

Use the NuGet packages for .NET 4.7.x/.NET Standard 2.0 targets, which will be a bit slower overall.
Bump .NET Core to v2.1 minimum (maybe even 2.2?) so we can use all the latest and greatest in perf, and hope that most people would use that version instead?

Some remarks:

FoundationDB is probably faster on Linux than Windows anyway, so it makes sense that the .NET Core platform would get the best performance possible. The "old" .NET Framework 4.x version would only work on Windows which as a bit less performance anyway.
Nobody probably uses this library with older .NET 4.x code, and new developments would probably target .NET Core 2.x so it makes sense to favor this platform going forward.
.NET Standard 2.0 is between two chairs due to the lack of native support for Spans and IAsyncEnumerable. App developers are already confronted to this issue, and this will probably force everyone to target netcoreapp2_# even for classes library.
Only library developers that want to use FoundationDB, but still be .NET Standard 2.0 compatible for maximum reach, are impacted. But again they already are because of the Span/IAsyncEnumerable issue.

Oct 25 '18 19:10 KrzysFR

Some work has been done recently to start incorporating spans in the API.

More work is required to really take advantage of the zero-copy aspect, especially for the parsing of keys and values.

Nov 27 '19 14:11 KrzysFR

In master, most of the "inbound" slices (from app code to the binding) are replaced with ReadOnlySpace<byte> and extensions methods have been added with the old Slice arguments.

This is required because Slice can represent the Nil key which is different from the Empty key, while ReadOnlySpace<byte> does not have the concept of null (only empty).

For "outbound" slices (returned by the binding to the caller), I'm keeping "Slice" for the moment. We cannot use spans for these (Task<ReadOnlySpan<byte>> is not legal). We could use ReadOnlyMemory<byte> instead, but there is still the issue of ownership of the memory buffer which makes it more difficult to deal with buffer pooling. We should investigate IOwnedMemory<byte> to see if this could help on that topic?

Ideally, the memory used by a transaction could be pooled, and we would have a way to safely return the memory to the pool once the application is done with it. But currently, unpacking a tuple from a slice keeps references the underlying memory buffer. We don't know how long the application will keep this tuple alive, and it could even decide to store it in some static variable somewhere :/

Jan 13 '20 23:01 KrzysFR

foundationdb-dotnet-client foundationdb-dotnet-client copied to clipboard

Replacing Slice with Span<byte>/Memory<byte>

foundationdb-dotnet-client
foundationdb-dotnet-client copied to clipboard