gdext Async/Await for Signals

It should be possible to implement Future for signals, and add support for async functions in exported methods, with them returning a signal.

These two things combined should allow us to use async/await syntax when calling methods both in rust and gdscript.

This is going to require a lot of upfront work before it's feasible to start working on this feature. First an actual implementation of signals are needed, but we'll also likely need to create some godot-objects directly in our bindings that will handle the awaiting and execution of these futures, and that is not something we have planned/thought out how to do yet.

May 07 '23 20:05 lilizoey

I implemented a POC for a SignalFuture and minimal async runtime that allows to await signals. https://github.com/godot-rust/gdext/commit/5891c5bb2a2ca15e9a1ba904797bda608dcc997d

It still has two issues which I did not bother to resolve yet:

~~it's currently not possible to spawn nested tasks. It should be possible to resolve that, but I'm not sure how important it actually is.~~ EDIT: has been resolved.
Only Signal arguments that are Sync + Send can currently be used in the async context. This makes it impossible to access any Gd<T> coming from a signal.

Aug 14 '24 12:08 TitanNano

@TitanNano I think your implementation doesn't work correctly if the Callable is called on another thread than where the task has been created.

https://github.com/jrb0001/gdext/commit/74afb5b4b58a0c1cfee6e7f1d9a48a22dc57d9b0 is a variant without the thread-local, but it needs some unsafe and panics when called from another thread. It should also fix recursively creating tasks, but I didn't test it.

https://github.com/jrb0001/gdext/commit/a27d0bc384e27238410df7ac6200ed6874bfca0c is a third, more complex variant which has both !Send and Send futures. It can only be called by another thread if both the task and the future are Send. Otherwise it panics or doesn't even compile.

Sync isn't needed here because polling a Future needs a &mut so only one thread can access it at a time anyway.

Aug 24 '24 21:08 jrb0001

@jrb0001 yes that is true. I ignored that so far. Have you encountered a case were the callable was called on an other thread?

https://github.com/jrb0001/gdext/commit/74afb5b4b58a0c1cfee6e7f1d9a48a22dc57d9b0 is a variant without the thread-local, but it needs some unsafe and panics when called from another thread. It should also fix recursively creating tasks, but I didn't test it.

I don't see the benefit of this if it still panics.

https://github.com/jrb0001/gdext/commit/a27d0bc384e27238410df7ac6200ed6874bfca0c is a third, more complex variant which has both !Send and Send futures. It can only be called by another thread if both the task and the future are Send. Otherwise it panics or doesn't even compile.

Sync isn't needed here because polling a Future needs a &mut so only one thread can access it at a time anyway.

Gd<T> is neither Send nor Sync so as soon as either of them is required the future becomes much less useful since you can't access any Gd<T> inside of it.

Aug 24 '24 22:08 TitanNano

@jrb0001 yes that is true. I ignored that so far. Have you encountered a case were the callable was called on an other thread?

No, I spotted it while reading through it.

jrb0001@74afb5b is a variant without the thread-local, but it needs some unsafe and panics when called from another thread. It should also fix recursively creating tasks, but I didn't test it.

I don't see the benefit of this if it still panics.

The panic is guaranteed while your version can run another future which was assigned the same ID on the thread of the signal.

jrb0001@a27d0bc is a third, more complex variant which has both !Send and Send futures. It can only be called by another thread if both the task and the future are Send. Otherwise it panics or doesn't even compile.

Sync isn't needed here because polling a Future needs a &mut so only one thread can access it at a time anyway.

Gd<T> is neither Send nor Sync so as soon as either of them is required the future becomes much less useful since you can't access any Gd<T> inside of it.

LocalSignalFuture of the third variant works with Gd<T> both as signal argument and when captured by a closure with the limitation that the future is !Send and panics if the signal happens on another thread. I think that solves the main usecase?

SendSignalFuture of the third variant is less useful, I agree. Maybe it makes sense to merge them and select the implementation based on the experimental-threads feature?

Aug 25 '24 07:08 jrb0001

The panic is guaranteed while your version can run another future which was assigned the same ID on the thread of the signal.

This can easily be solved by adding the ThreadId into the Waker.

LocalSignalFuture of the third variant works with Gd<T> both as signal argument and when captured by a closure with the limitation that the future is !Send and panics if the signal happens on another thread. I think that solves the main usecase?

But it adds a lot of complexity without solving any problems (at least as far as I'm aware). Correctly, panicking can be solved like I mentioned above.

SendSignalFuture of the third variant is less useful, I agree. Maybe it makes sense to merge them and select the implementation based on the experimental-threads feature?

Even with extermental-threads Gd<T> is still not thread-safe. Without access to engine managed objects, I don't see much value for a Send + Sync future. But we can even offer a thread-safe task spawn function which either moves the future into the Waker or uses a static (i.e. not thread-local) ASYNC_RUNTIME.

I have been thinking about the stated concern of signals being emitted on other threads, and I believe the correct solution would be to redirect the wakers future polling into the correct thread. Unfortunately, this is not possible as far as I can see. Godot does not expose any way to queue callable for a specific thread. Generally, there is also too little control over where things get executed to come up with a reliable approach.

So the alternatives for now that I can see are:

a) limit async tasks to the main thread by panicking when trying to spawn a task on another thread. With this limitation in place, we can connect to signals in deferred mode inside the Signal Future and reliably run the callable on the main-thread.

b) properly panic when the callable runs on a different thread than the future and leave the rest up to the user. This is more flexible but much less predictable, so I think a) is probably preferred.

Once there is a thread-safe version of Gd<T> we can add a way to spawn Send + Sync futures and poll them on whatever thread the signal is emitted. This should be quite trivial.

Aug 25 '24 18:08 TitanNano

There are seem to be some issues related to emitting a signal from a callable connected to itself. This will call the Callable a second time and its Mutex is already locked. From looking at Godot source, it seems that oneshot connections are disconnected after calling the Callable. So it needs to either ignore calls while the Mutex is locked or use deferred mode.

I also tried implementing futures_lite::Stream and the only way to avoid the issue there is to use deferred mode, as far as I can see.

So if deferred mode signals always run on the main thread, then that's probably the best option.

I also ran into an issue with the async-channel crate which deadlocks when the receiving Future is polled from inside the Waker (also caused by a Mutex). std::task documentation doesn't prohibit our behavior, but I wouldn't be surprised if other crates have similar issues. There can be quite some recursion if there is a chain of futures, each doing something that wakes up the next one. I think it would be better to simply avoid calling user code from inside the waker, if that's even possible.

I ended up using an Autoload with a smol::LocalExecutor, ~10 lines. Maybe there is some other way to execute code every frame, without adding a node or registering a class?

Aug 25 '24 20:08 jrb0001

There are seem to be some issues related to emitting a signal from a callable connected to itself. This will call the Callable a second time and its Mutex is already locked. From looking at Godot source, it seems that oneshot connections are disconnected after calling the Callable. So it needs to either ignore calls while the Mutex is locked or use deferred mode.

I think this was changed in 4.3. In 4.2 it's disconnected after calling and in 4.3 it's disconnected first.

Aug 25 '24 22:08 TitanNano

I think I found another two issues that affect all our implementations:

Dropping the Future doesn't disconnect the Callable.
Disconnecting the Callable causes the Future to get stuck forever.

Disconnecting requires the Callable so we need to keep it around somewhere in the Future. Which means that its internal refcount will never drop to zero so the Callable is never dropped. I don't know any other way to reliably detect a disconnected signal.

Aug 26 '24 20:08 jrb0001

Disconnecting requires the Callable so we need to keep it around somewhere in the Future. Which means that its internal refcount will never drop to zero so the Callable is never dropped. I don't know any other way to reliably detect a disconnected signal.

We can use a WeakRef I think.

I also ran into an issue with the async-channel crate which deadlocks when the receiving Future is polled from inside the Waker (also caused by a Mutex). std::task documentation doesn't prohibit our behavior, but I wouldn't be surprised if other crates have similar issues. There can be quite some recursion if there is a chain of futures, each doing something that wakes up the next one. I think it would be better to simply avoid calling user code from inside the waker, if that's even possible.

We might want to connect to the signal in the normal way and schedule a deferred callable to actually poll the future. But I haven't tried it yet.

Aug 26 '24 22:08 TitanNano

Disconnecting requires the Callable so we need to keep it around somewhere in the Future. Which means that its internal refcount will never drop to zero so the Callable is never dropped. I don't know any other way to reliably detect a disconnected signal.

We can use a WeakRef I think.

Isn't that only for RefCounted? Unfortunately Callable are their own type, not even an Object.

I also ran into an issue with the async-channel crate which deadlocks when the receiving Future is polled from inside the Waker (also caused by a Mutex). std::task documentation doesn't prohibit our behavior, but I wouldn't be surprised if other crates have similar issues. There can be quite some recursion if there is a chain of futures, each doing something that wakes up the next one. I think it would be better to simply avoid calling user code from inside the waker, if that's even possible.

We might want to connect to the signal in the normal way and schedule a deferred callable to actually poll the future. But I haven't tried it yet.

I had that idea too but didn't find a generic way to schedule it. There is Callable::call_deferred but that isn't exposed to gdextension as far as I can see. Maybe something like callable.to_variant().call("call_deferred", args) but not sure.

Aug 27 '24 06:08 jrb0001

Isn't that only for RefCounted? Unfortunately Callable are their own type, not even an Object.

Yeah right I missed that.

Disconnecting requires the Callable so we need to keep it around somewhere in the Future. Which means that its internal refcount will never drop to zero so the Callable is never dropped. I don't know any other way to reliably detect a disconnected signal.

Why do you believe it will never be dropped? As soon as the future is dropped, the callable should be dropped as well, shouldn't it?

I had that idea too but didn't find a generic way to schedule it. There is Callable::call_deferred but that isn't exposed to gdextension as far as I can see. Maybe something like callable.to_variant().call("call_deferred", args) but not sure.

Looks like Callable::call_deferred is only available as a varcall and those are not implemented for built-in types. I will see if your suggestion work and we should see that Callable::call_deferred is properly implemented as a varcall.

Aug 27 '24 08:08 TitanNano

I added the handling of the following cases to my implementation:

godot_task should panic if the current thread is not the main thread.
Waker::wake should panic if the current thread ID doesn't match the original one.
The future is now polled inside a deferred callable, so waking and polling are no longer happening in the same call stack. This way, signals emitted on a different thread should also be redirected to poll the future on the main thread now.
SignalFuture disconnects its callable if the signal hasn't been emitted when the future is dropped.

https://github.com/godot-rust/gdext/compare/7634fe769d1fcb66209586f0b6c06aac40978253...4c2d9a85a0ea29fc0ade525e8538c216cd792c34?expand=1

EDIT: added handing for nested godot_tasks and clean-up when deinitializing. https://github.com/godot-rust/gdext/compare/7634fe769d1fcb66209586f0b6c06aac40978253...341a40c0a144e57edc8392090a55881b2b15f50a

Aug 27 '24 09:08 TitanNano

Disconnecting requires the Callable so we need to keep it around somewhere in the Future. Which means that its internal refcount will never drop to zero so the Callable is never dropped. I don't know any other way to reliably detect a disconnected signal.

Why do you believe it will never be dropped? As soon as the future is dropped, the callable should be dropped as well, shouldn't it?

If the Future is dropped, then yes, it will work as intended. But if the signal is disconnected (freed object / through code), then the Future will wait for a wake-up which will never come.

Aug 27 '24 19:08 jrb0001

If the Future is dropped, then yes, it will work as intended. But if the signal is disconnected (freed object / through code), then the Future will wait for a wake-up which will never come.

Yes, if someone goes out of their way to disconnect the callable, then the future will never complete. I don't see how this can be avoided. I wonder how this behaves in GDScript.

Aug 27 '24 19:08 TitanNano

If something intentionally disconnect a callable it didn't connect, then we can blame that. But can we expect every code that calls Node::queue_free(), decrements a RefCounted or otherwise frees an object to check if there is any callable connected to any signal on that object?

It's possible to detect the freed object case by polling Signal::is_null() but that's too heavy to do by default.

I did a quick test and it looks like gdscript await also gets stuck, or at least doesn't continue with the next statement.

Aug 27 '24 20:08 jrb0001

@jrb0001 I pretty much had the same ideas and came to the same conclusion. Getting a callback on object deletion is just not possible, and checking all pending futures every frame is excessive.

A middle ground solution that came to me could be: If the AsyncRuntime Vec is filled to capacity (.len() == .capacity()) when adding a new task, then we could go through the pending futures and try to garbage collect some of them before extending the capacity.

Alternatively, we could always check for dead futures during insert, since we already scan the buffer for empty slots anyway.

Aug 28 '24 10:08 TitanNano

I don't think we can know when a future is dead, though. If we want to keep this runtime agnostic (and I strongy believe we should) we have no information when a future will stop making progress.

Best we can do is to return a TaskHandle from godot_task and let the caller manually cancel the task.

Aug 28 '24 12:08 TitanNano

It looks like it is possible to use a new Callable for disconnecting, as long as they are equal and hash() returns the same value.

Signal disconnected or object freed:

Connected Callable is dropped automatically.
Notify Future.

Future dropped:

Do nothing if signal.object().map(|o| o.is_instance_valid()) != Some(true) (freed object).
Use second Callable to disconnect the signal.
Connected Callable is dropped automatically.

Aug 28 '24 18:08 jrb0001

Notify Future.

How do you do this?

Future dropped:

Do nothing if signal.object().map(|o| o.is_instance_valid()) != Some(true) (freed object).

Use second Callable to disconnect the signal.

Connected Callable is dropped automatically.

I already implemented this.

Aug 28 '24 18:08 TitanNano

Notify Future.

How do you do this?

You probably need to change the future to return Option<T>/Result<T, _>, unless you want to let it panic on disconnect. Then inside the Drop impl just write into your state and wake the Waker, just like you do when called from Godot.

Aug 28 '24 19:08 jrb0001

You probably need to change the future to return Option<T>/Result<T, _>, unless you want to let it panic on disconnect. Then inside the Drop impl just write into your state and wake the Waker, just like you do when called from Godot.

Let's assume we can create a custom callable which wakes the future both on invocation and on Drop. The callable also can be cloned and maintains the same hash, so we don't have to reference the engine callable in the future.

New references to the callable can still be created via signal.get_connections(). So Object.free() != drop(callable), we can't rely on the callable being dropped when the object is freed.

Or do you see another way to detect when the object is freed?

Aug 28 '24 21:08 TitanNano

True, as long as some other code keeps the reference it got from signal.get_connections(), the Callable isn't dropped. Such code could also call it directly or connect it to some other signal and I think that could do more damage than just a stuck Future.

We can't guarantee much in presence of malicious code (except the basic rust guarantees), but in my opinion best-effort detection of stuck futures is better than not having anything at all.

The guaranteed way to detect a freed object is through something like this:

let object: InstanceId = signal.object_id().unwrap();
...;
let is_freed: bool = Gd::try_from_instance_id(object).is_err();

It requires polling which can get expensive if there are hundreds/thousands of waiting futures and it doesn't help in the signal.disconnect() case. The Drop impl approach is much cheaper and should be good enough in most situations.

Aug 29 '24 19:08 jrb0001

I have now added:

A TaskHandle that can be used to cancel a task and check if the task is still pending.
A second GuaranteedSignalFuture that resolves to an Option<R> and wakes the future if the Callable is dropped.

https://github.com/godot-rust/gdext/compare/7634fe769d1fcb66209586f0b6c06aac40978253...a9d4f0e5bd3f7d4ba6231f1dcc00145d4ae3723e

Let's see how well this all works. I'm also open to suggestions for a better name, for the second future.

@Bromeon are you interested in including this into the project? I will start testing this in my project now, but I happy to promote it to a PR.

Aug 29 '24 21:08 TitanNano

Very cool to see progress on this front! :+1:

@TitanNano I'm trying to catch up, can you maybe summarize how the user API of this might look? Any notable trade-offs/downsides? Can it be feature-gated?

Aug 31 '24 18:08 Bromeon

@Bromeon sure, so far, we have the following:

User-facing APIs

godot_task(...) This should be used to create a new async task on our runtime. The TaskHandle can be used to check if the task is still in progress and cancel it if desired.
TaskHandle::is_pending() returns a bool indicating whether the task is still in progress or has been completed.
TaskHandle::cancel() allows canceling a pending task.
ToSingalFuture::to_future() creates a new SignalFuture from an existing signal.
ToGuaranteedSignalFuture::to_guaranteed_future() creates a new GuaranteedSignalFuture.
SignalFuture resolves when the corresponding signal is emitted.
GuaranteedSignalFuture resolves when the corresponding signal is emitted or the object of the signal is freed. This future resolves to an option.

Trade-offs

godot_task(...) accepts any future and does not require them to be Send + Sync. This has been chosen to allow non Send + Sync values to be moved into the future, like Gd<T>. The consequence of this is that it can only be used on the main-thread and will panic on any other thread. Unfortunately, signals can be emitted from any thread and the engine only provides APIs to move execution back to the main-thread. Our non thread safe futures have to remain on their origin thread, and we can only do this on the main-thread. When thread-safe alternatives for Gd<T> and other types exist, we can use thread-safe futures and have more flexibility in how and where we execute them.
SignalFuture could potentially never resolve if the object of the signal is freed while the future exists. The corresponding task has to be canceled manually.
GuaranteedSignalFuture also resolves when the object of the signal is freed. This is not 100% guaranteed, though. Users can acquire a reference to the signal connection of the future via Signal::get_connections() and the future will not resolve until all references to the connected Callable are dropped.
Both futures can only resolve to Send + Sync values. Signal arguments have to be passed through a Mutex because of the Send + Sync constraint of Callables. This makes it impossible to get a Gd<T>, Dictionary or Array from a signal.

Feature Gate

All changes are self-contained, so feature gating them should not be a problem.

Sep 01 '24 09:09 TitanNano

@Bromeon let me know if you got any questions 😃

Sep 30 '24 22:09 TitanNano