Reuse inner if possible?
I have a use case, within each second, thousands of oneshot channels are created and signaled, and then dropped.
I am thinking about if they can be reused somehow.
For each oneshot there is a new Inner allocated Arc::new(Inner::new()) on heap.
https://github.com/irrustible/async-oneshot/blob/main/src/lib.rs#L21
Can we somehow reuse the Inner in order to reduce heap allocation? Say if it is received, the Inner can be reset and used to create another pair of tx and rx?
Maybe. My local working copy I've been experimenting with some of this. The problem is that it makes it difficult to ensure safety (i.e. i have to expose quite a lot of unsafe api to enable it).
I'm open to suggestions if you have any particular ideas?
Actually, I should note that on my working copy, the arc is gone and replaced with a box because we already have atomic bitflags we can use.
@jjl Box causes heap allocation too. I think it may leave the choice to caller , to reuse the inner after it is received
yes, a box does cause heap allocation, however turning the arc into one moves us one step closer to an alloc-less api (because i would ultimately like to support no-std without alloc).
the problem is that something has to do the cleanup. in my local copy, if one side drops, the other side becomes responsible for cleanup. and then there's creation. by using alloc now, i'm able to enforce a lot of invariants (like "there will not be more than two threads accessing this at a time"). if you want to start doing reuse, well creating a new pair from existing data is going to be an unsafe operation right there because i can no longer guarantee it. and then i still have to support the existing methods of use (because most people do not care about the alloc).
so it's not as straightforward as it sounds, but i think we can work s omething out.
okay, i've pushed (most of) my local changes to the feature/no-arc branch so you can have some idea of what i'm talking about.
this is work in progress. but you can see the way i was thinking of evolving the api
Hrm, I think the following style of api is possible without needing unsafe:
impl<T> Sender<T> {
pub fn reestablish(&self) -> Result<Receiver<T>, ()>
}
And similar on the Receiver side. It can check if the other side is dropped. If so, it can proceed to reestablish. Otherwise you'd need both halves.
an alloc-less api
Is that possible? I really doubt that. But if it comes to true, it would outperform tokio's oneshot a lot.
Sure it's possible. Take a look at the WIP - the two halves use a pointer internally. That we happen to allocate it with Box::new is incidental - we can equally take a pointer of miscellaneous origin, provided it will survive as long as the two halves do. Like if you created it on the stack...
Ok, current plan is to base the new Sender and Receiver on the following type to hold a channel:
pub enum Holder<'a, T> {
/// Lifetime-bound reference
Ref(&'a T),
/// A pointer we do not own
BorrowedPtr(NonNull<T>),
/// A pointer originating from a box that between us
#[cfg(feature="alloc")]
OwnedBoxPtr(NonNull<T>),
}
I'm still a bit on the fence about the lifetime. I don't like to taint the Sender and Receiver types with it, especially given that today's async ecosystem doesn't really handle borrowing in nested tasks very well (it's quite a hard problem). That said, there's a trivial workaround, if such were were really considered required:
type Sender<T> = async_oneshot::Sender<'static, T>;
type Receiver<T> = async_oneshot::Receiver<'static, T>;
I don't know this extra option is particularly useful yet, but that said, I have some plans involving subtasks and borrowing on the horizon and I'm sure I'm not the only one.
Would this cover your use case adequately?
Still not sure exactly sure in which way you want to reuse it.
In my working copy, we now have three different means of reuse:
- Sending a message no longer automatically closes the channel, so it can be directly reused.
- Sender and Receiver can be put into
recyclemode. After they observe the other side has closed, you can callrecycle()on them and get back the other half back very cheaply. - Sender and Receiver can be put into
reclaimmode. During what would be a drop, we can stamp the atomic flags with a sentinel value that some code you've written can use to recognise the memory is free. Or if you just know you can use unsafe.
So I think whatever it is you were hoping for, I've probably given you it.
Create/drop is now 40-50% cheaper in my working copy because I've cut out the Arc, so a create/drop without using the channel is 4 atomics (and perhaps a box, if you quite sensibly aren't managing the memory yourself).
What I would note is that you can't magically assume that getting rid of the box will make everything a lot cheaper. Benching the new version with a box and with a reference took it from ~28.5ns to ~26ns on my machine. The benchmark is probably being quite generous to the allocator, but 2ns is frankly incredible allocator performance. And I'm just using musl's allocator, which while it isn't terrible any more, it's not as fast as e.g. jemalloc, tcalloc or mimalloc.
So uh yeah, we're really in the weeds of performance here. I've got my full no alloc support, but it hasn't given me quite the performance wins I was hoping for. The biggest win by far was removing the Arc (cutting out 4 relaxed atomics!)
I consider this will be fixed with #18 where I've added 3 different ways to reuse hatches. Now would be an excellent time to feed back on that PR.