wasmtime icon indicating copy to clipboard operation
wasmtime copied to clipboard

wasi-io: Reimplement wasi-io/poll using a Pollable trait

Open badeend opened this issue 6 months ago • 12 comments

Prior discussion: https://bytecodealliance.zulipchat.com/#narrow/stream/217126-wasmtime/topic/Change.20Subscribe.20trait

  • Renamed the existing Pollable struct to PollableResource
  • Reimplemented wasi-io/poll. This introduces a new Pollable trait which is lower level, doesn't require heap allocations to poll, has mutable access to the WasiView, and can be used as a standalone resource without a parent. The Subscribe trait is kept intact, but this is now a utility interface, implemented in terms of Pollable.
  • Eliminate the (now) unnecessary surrogate parent resource of clock pollables
  • Added ResourceTable take & restore as a general purpose replacement for iter_entries. That one was used only by the old poll implementation.

Additionally:

  • @pchickey Forbid empty poll list. Fixes: WebAssembly/wasi-io#67

badeend avatar Jan 24 '24 20:01 badeend

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasi", "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

github-actions[bot] avatar Jan 24 '24 21:01 github-actions[bot]

Thanks for the feedback. I agree on the "panicky" point. I'll add an error type and remove the panics.

One thing that's not in this PR, but I assume will most likely be added at some point, are untyped take_any and restore_any variants. The drawback of reverting the Lease & SlotIdentity design, is that the restore(_any) API becomes (even) easier to misuse. Because then the consumer can restore any value at the index of a previously differently typed entry. I'm worried about the developer experience of this, as the corruption would happen silently and the place where they encounter the WrongType errors could be miles away from where the problem actually is.

Anyway, I'm fine with your suggestions. I just wanted to make sure the trade-offs are known.

badeend avatar Jan 26 '24 20:01 badeend

Hm ok, for that I think you have a good point about accidentally messing up these APIs. I think this may still be surmountable perhaps with some trickery, but I'd also need to see the usage of take_any and restore to know better. Want to discuss on a future PR with that implemented or hash it out here? (I'm fine either way)

alexcrichton avatar Jan 26 '24 21:01 alexcrichton

I chose to go with a hybrid approach. For the public API, I changed it to what you suggested. Internally, I removed Lease & SlotIdentity. But I kept Slot to perform the resource type check. Also, as part of the updated design (see above) I needed take_any and restore_any so I included those as well.

badeend avatar Feb 01 '24 16:02 badeend

Reading over this and see how this all turned out, I'm personally starting to get second thoughts on this. We're effectively reimplementing our own Future trait and as I'm sure you've seen we start implementing our own primitive functions (e.g. poll_fn) as well as we can't use standard things like async fn or #[async_trait]. I'm a bit worried that the direction this is taking us is straying off the path of maintainability for async support as things get more advanced over time.

Now that's all easy to say but this PR is still solving a concrete problem which is letting implementations access resources while polling, so I don't think simply closing this PR is an option. That being said after having read over this I wonder if there's perhaps an alternative implementation route that we can take.

Originally when designing Future-the-trait we ran into this issue of situations wanting to pass more context along through the poll method but the context doesn't survive longer than a single call to poll. To do that we ended up creating task-local variables which are like thread locals but instead stick with a task. That doesn't solve the immediate problem at hand though since you want mutable access, not just readable access.

To solve the mutability problem I realized that the take/restore bits look like Option<T> and so they've already got runtime state associated with them. One alternative would be to use a RefCell<T> instead and effectively repurpose that runtime state. That would enable acquiring &mut T from &ResourceTable so long as it's done "correctly" which is basically already the situation we have today (make sure you restore after you take).

How would you feel about something like that? We could still preserve get_mut as a method which has no runtime overhead (apart from storage space) but I'm imagining that a borrow_mut() method would be added. I'll note that get would have to go away in this world and be replaced with borrow() as a consequence, which likely affects code we have today.

While RefCell is unlikely to win any award for being the most ergonomic thing in the world this feels like it might provide a better tradeoff because we wouldn't fall off the well-trodden-path of Rust async into custom traits and such. I would want to make sure it works for your use case though.

I also realize though that you've probably already put in a great deal of work to this PR with 2 versions now so I'm hesitant to ask for a third. I'd be happy to help sketch this out and do some of the refactoring work to see if I feel like it's going to pay off.

alexcrichton avatar Feb 05 '24 17:02 alexcrichton

I think you mean changing

async fn ready(&mut self);

to

async fn ready(&mut self, table: &ResourceTable);

right?

That doesn;t work because &ResourceTable is not Send as ResourceTable is not Sync.

badeend avatar Feb 05 '24 22:02 badeend

Good point, yes, I'm more-or-less saying we should do that. (either that or use a task-local but I think that still captures &T).

Mind trying make ResourceTable implement Sync? I think that's probably the addition of a few trait bounds in its internal trait objects. I think everything we put in there is already Sync although if it things aren't currently Sync that'll pose a larger problem.

alexcrichton avatar Feb 09 '24 11:02 alexcrichton

I understand your concerns, yet I'd rather not go for round three right now, which would include reverting https://github.com/bytecodealliance/wasmtime/pull/7802. So instead, I've changed the questionable types to be private to the poll.rs module. That way, all the iffy-ness is contained to just a single file that we can iterate on later. From the outside nothing significant has changed, except that now I can use poll_ready_fn, which is what I personally was after.

Hope that's OK for you

badeend avatar Feb 11 '24 13:02 badeend

Wanted to say I have not forgotten about this, I have been looking for time to write up something longer-form, which I hope to get to by tomorrow. Is this blocking anything though that it would be prudent to land now rather than later? If so I think it's good to go as-is, but otherwise I'd like to take some more time to write up longer-form thoughts.

alexcrichton avatar Feb 13 '24 21:02 alexcrichton

There's no immediate rush from my side, so feel free to take your time.

badeend avatar Feb 13 '24 21:02 badeend

Ok thanks again for your patience here, very much appreciated!

I've gotten some time to think and work on this. I was leaning towards merging this, but then I realized that I'd prefer to avoid a situation where we land this and then later revert most of it towards a different strategy. In that sense I wanted, time permitting, to take a moment and figure out if alternative strategies would work. I'm getting a growing sense of unease with this direction as it's more-or-less a custom Future trait and is something we'd ideally avoid.

So assuming that the main goal of this PR is to get access to ResourceTable during async fn ready I originally suggested the borrow/borrow_mut idea above using RefCell. I tried implementing that and turns out it doesn't work. That means that ResourceTable contains RefCell and async fn ready would close over &ResourceTable (e.g. it's a new function argument). In such a situation it means that the returned future, which must be Send, closes over &ResourceTable. That type is not Send because it requires ResourceTable: Sync which is not satisfied with RefCell. So that cans the idea of. using RefCell.

After talking a bit more with @pchickey, however, I'm growing more fond of the idea of using RwLock<T> here instead of RefCell<T>. Not for the actual blocking aspect but instead only for the "it's Sync" aspect. To that end I implemented this on a branch and got tests passing with it. The changes are:

  • Replace ResourceTable::get with ResourceTable::{borrow, borrow_mut} that return RwLock{Read,Write}Guard<T>.
  • Remove table methods returning Any
  • Add &ResourceTable as an argument to the ready async function.
  • Replace most usage of table().get() with table().get_mut() (avoids locks)
  • Use u32 indices in Pollable's make_future internals instead of Any
  • Rewrite headers in wasi-http to avoid needing Any by representing headers as Resource<Resource<hyper::HeaderMap>>

The major consequences of this decision, however, are:

  • ResourceTable::{borrow, borrow_mut} require atomic manipulations. No blocking, but it's atomics for something that's not contended 99.9% of the time.
  • The std::sync::RwLock type cannot be used because std::sync::RwLock{Read,Write}Guard is not Send. I temporarily added a tokio dependency to wasmtime-the-crate and used tokio::sync::RwLock instead. Long-term I would like to avoid a tokio dep in the wasmtime crate.
  • There's a few minor cleanup still to be had in terms of threading a few more errors in a few more places.

Personally I'm inclined to take a route that looks like this, namely threading arguments through async fn rather than threading arguments through fn poll. This is a foundational change to how things work though, especially around a new footgun of not being able to borrow_mut twice. In that sense I'd like to get feedback along the lines of:

  • @pchickey does this all sound reasonable enough to you?
  • @badeend does this still solve your original use case, and if so what do you think about this approach vs the poll approach?

alexcrichton avatar Feb 14 '24 18:02 alexcrichton

also cc @elliottt since you've touched a lot of WASI internals and you probably want to take a look too

alexcrichton avatar Feb 14 '24 18:02 alexcrichton