seastar
seastar copied to clipboard
Sharing temporary_buffer across shards?
Currently, it is not safe to use distinct temporary_buffer objects which share any underlying buffers across shards (aka "hidden sharing"). In practice, this generally means that is even unsafe to move any temporary buffers received from seastar to another shard, since these will often share underlying buffers with other temporary buffers (e.g., if they are derived from the same input stream). The only safe approach is to copy the underlying bytes in order to pass them to another shard.
This makes it very difficult to do "zero copy" manipulation of such buffers. A typical use case is that a connection is processing requests and these requests are bound for various shards (depending on the contents of the request, e.g., what partition is involved), and then we want to pass the entire payload of the request to the owning shard. This should be easy to implement in a zero-copy thread safe way: there is no concurrent use of the buffer at all, it is examined on one shard then the entire buffer is moved to another shard (ownership could stay with the original shard, or not, the key is there is no concurrent access) where it is accessed. However, the hidden sharing inside temporary_buffer makes this impossible as far as I can tell.
The only definite reason we can find that temporary_buffer has unsafe hidden sharing is that delete::impl::refs is accessed non-atomically.
Two questions:
-
Am I missing something about how to use distinct
temporary_bufferobjects across threads "zero-copy" without hitting this problem? For example I feel like ScyllaDB would run into the same unnecessary copy as described above unless it somehow manages to guarantee that the "connection" shard is the same as where the operation will need to be processed. -
Is there any willingness to make
delete::impl::refsastd::atomic<unsigned>? It seems to us this would solve the cross-shard hidden sharing problem, with the possible exception of deleters of typelambda_deleter_implwhich have non-trivial lambdas (or lambda destructors) which expect to run on the same shard they were created on. Evidently, this would have a performance impact for all temporary_buffer use. That would look like something like this https://github.com/scylladb/seastar/pull/2451.
Maybe there's a better solution out there than (2).