tokio-uring
tokio-uring copied to clipboard
API: Buffer initialise level
Originally in #215
This issue proposes a change in API, such that set_init tracks the current initialization level, rather than the historial watermark The motiviating example is buffer reuse
Invalid Code
let mut buf = Vec::with_capacity(2048);
loop {
b = socket.recv(buf).await.1;
let _ = target.write_at(b,0).await;
}
To reuse the buffer, you must instead incur the overhead of a slice, because initialization holds the historical fill level, not the current
let mut buf = Vec::with_capacity(2048);
loop {
(r, b) = socket.recv(buf).await;
let _ = target.write_all_at(b.slice(0..r.unwrap()),0).await;
}
I propose that calling set_init()
with a value should not maintain the historical watermark, but instead set the curent level of initialized bytes. I have a number of reasons
- Maintains Safety. It doesn't make the API any less safe to do this. It is still impossible to access uninitiatlised bytes if the level is lower than the historical max.
- Improved safety. Should you be using the bytes as a proxy for some type
T
, knowing the exact size ofT
tracked in the buffer reduces footguns. - Speed. Buffer reuse should not incur Slice overhead
The best stated argument for the status quo is from @mzabaluev , which i quote
I agree that this API is not intuitive, but this is what we have to account for two separate concerns:
- Tracking the length of initialized bytes in the buffer. There is no reason to ever mark bytes as uninitialized after they've been initialized, and the only purpose of this water mark is to prevent safe access into uninitialized data.
The proposal does not infringe on the ability of the API to limit access to uninitialized data.
- Tracking the length of the data filled by the last read operation.
The status quo fails to manage this without a runtime penalty, but wit the proposal, would also manage it.
This may point to the std group wanting both init
and filled
in the owned buffer.
Its interesting that it goes down the concrete type with destructor rather than trait object route. Presumably on code generation efficiency grounds.
Its interesting that it goes down the concrete type with destructor rather than trait object route. Presumably on code generation efficiency grounds.
Yes. Would help for @Noah-Kennedy and @carllerche weigh in. Those two and Nick should be on the same page for the future otherwise I think the rest of us are spinning our wheels with very little purpose. I guess there's always a chance someone else discovers something one of them hadn't foreseen, but given how long Rust has been used for writing clients and servers, I would doubt it.
We could shift entirely from the IoBuf
and IoBufMut
traits to owned-buf. ~It could also replace FixedBuf
~. The OwnedSlice
and OwnedCursor
variants in that library would also replace our slicing infrastructure fully, and i think also the BoundedBuf
trait. It'd actually be quite a significant simplification of our API
Can't replace Fixedbuf, because it doesn't hold enough information to identify the registry.
Can't replace Fixedbuf, because it doesn't hold enough information to identify the registry.
So could a FixedBuf and a sliced FixedBuf be used with non-fixed io operations like read
and write
? It can be now because FixedBuf implements IoBuf and IoBufMut. And all the io operations take a BoundedBuf{Mut} trait buffer.
Maybe io operations would still have to take BoundedBuf or BoundedBufMut and does that negate enough of the benefits that OwnedBuf{Mut} could bring us?
I think OwnedBuffer is lacking actually. Its is not sufficiently general to recover some types of buffer. I think that could be improved with an additional member state: *mut void
, to be used to hold state required to rebuild the original buffer type. that would permit an From instance from any type I believe, whilst allowing sufficient generality to recover.
(and yes, I know exactly why void pointers are generally frowned upon from a safety perspective)
The function pointer for the destructor and pointer type is starting to look a look like the dyn FixedBuffers
in FixedBuffer. So maybe that is telling us something.
An OwnedBuf is already 40 bytes and an OwnedCursor is 56. At what point is the size of the wrapper going to get in the way of being nice to work with. They are already large enough where getting them to be cache aligned could make a difference on systems with tight caches. At least both those types would have room for one more 8 byte field.
I think the semantics of OwnedBuf can be modified to be more general. This will introduce a type parameter, which may be undesirable?
trait IntoOwnedBuf {
type State = ();
type Restore = Self;
fn into_owned_buf(self) -> OwnedBuf<Self>;
}
struct OwnedBuf<T: IntoOwnedBuf> {
// Describe the buffer data
ptr: *mut u8,
len: usize,
capacity: usize,
// Provision for returning ownership to the original buffer.
fn: unsafe fn(&mut OwnedBuf<T>) -> <T as IntoOwnedBuf>::Restore,
state: <T as IntoOwnedBuf>::State
}
impl OwnedBuf<T> {
pub fn restore(self) -> <T as IntoOwnedBuf>::Restore {
let fn = self.fn;
fn(self)
}
}
impl Drop for OwnedBuf<T> {
pub fn drop(&mut self) {
let fn = self.fn;
let _ = fn(self);
}
}
Some thoughts
- provision is made to restore any type back to the original buffer, but it is not compulsory. By setting restore to (), it will just drop.
- The type parameter is necessary, for recovering the original buffer type.
- state could be as simple as a pointer to the original buffer, which we
mem::forget
whilstOwnedBuf
exists. - whilst in theory you generate a lot of code for every generic variant, the actual representation of
OwnedBuf
is likely to often be two pointers forfn
andstate
. I'd hope the compiler could rationalise that.
And I think we could still use input from @Noah-Kennedy and @nrc to know whether adding a filled parameter, one way or another to our traits or eventually concrete types is the correct direction. If it is the correct direction, we might as well bite the bullet soon, to clean up some APIs.
@FrankReh - It seems like OwnedBuf
represents additional advancements from the discussions on Zulip. He also says that he hasn't "updated the traits yet" so I'm assuming that they will reflect the results of your conversations with him.
I think it's safe to assume that at least the basic filled
functionality will be present. I'm not sure about the more advanced slicing bits though. I've gone ahead and explicitly asked the question to be sure.
@ollie-etl
This will introduce a type parameter, which may be undesirable?
It is undesirable, but that doesn't mean it's not worth it. Now is the time to have the conversation :)
At first I thought that it might be too excessive, but the IntoOwnedBuf has additional utility.
My biggest concern is threading. The destructor would need to have some mechanism for invoking the recycle operation in the originating thread, or we need to send the buffer back to the originating thread. Or any Recycling pool needs to be Send + Sync + 'static
@rrichardson I'll admit I haven't given threading any thought. I had quite a blinkered view of application of a single threaded ring when I sketched this
@ollie-etl @rrichardson I've mentioned this to Noah too. I don't want the single threaded case to suffer much if the crate is trying to support multi threaded some day, but if it does, someone is going to want a feature to get the single-threaded performance back.
@ollie-etl @FrankReh - If we want to not disrupt the single-thread performance, and also avoid adding Send+Sync (+ 'static?) constraints to the IntoOwnedBuf trait, the dtor: fn(*const ())
approach might have to be it.
A person could do whatever gymnastics they needed with that *const ()
. It's not pretty, but I can't think of anything else that would work.
Outside of OwnedBuf, we are considering a Guard type for FixedBuf
that would make it Send
, and it'd do the right thing on drop
You still need state. There are valid buffer types which require drop logic. (Mmap being one). Without state, I can't make an mmap an owned buffer, because I can't call unmap on drop
You could lose the type parameter though, and just make restore completely unsafe, and restore to any type. Or have unsafeownedbuffer, which does that, and a generically typed ownedbuffer wrapping it, for those that want type safety
You still need state.
Absolutely. This should be stored and added as a param to the dtor.
You could lose the type parameter though, and just make restore completely unsafe.
I'm working on a PR for Bytes
and am doing basically that. I don't think it's avoidable. The best you could do would be store the TypeId on construction, then validate it when converting back to the original. It's not worth it, though, IMO.
Or have unsafeownedbuffer, which does that, and a generically typed ownedbuffer wrapping it.
That's an interesting idea. It'd be a nice idea for the paranoid, but in most cases, the variety of buffers will be small, so accidentally converting to the wrong buffer seems low.
If you look at all of the frameworks in the Tokio ecosystem that take only Bytes
I couldn't imagine them changing all of their APIs to be parameterized over the type of 3rd party buffer that Bytes is encapsulating.
Closing as I am not inclined to pursue, and appear to be in a minority.
I think this a good discussion, worth keeping open, even if we don't know the next step yet.