forest
forest copied to clipboard
Fix task cancelation
Issue summary
In Forest we use in a couple of places the Task::spawn
method to spin up new tasks.
Unfortunately for all of them expect the ones in daemon.rs
we don’t implement their cancellation making it possible that orphaned child tasks outlive parent tasks.
Any value moved to a future/ async block will never be dropped by the task unless the task has cleanly terminated or we are calling explicitly the cancel
method on the task handle.
This can result in memory leaks over the lifetime of the client and also hindering us to shutdown gracefully the node (rocksdb Rust wrapper do not always manage to drop itself).
Therefore we should:
- [ ] Use the new heaptrack doc to create a report on current memory usage/ leaks (https://github.com/ChainSafe/forest/pull/1819)
- [ ] Conduct any research that could solve the issue, ie:
- Tasks “structured concurrency"
- Use arbortable futures
- Other solution?
- [ ] Validate that forest can now be gracefully shutdown (https://github.com/ChainSafe/forest/issues/1587)
- [ ] Perform a second heaptrack analyse and observe if some of the leaks are gone (https://github.com/ChainSafe/forest/issues/1546)
Other information and links
https://docs.rs/async-std/latest/async_std/task/index.html# https://docs.rs/futures/latest/futures/future/struct.Abortable.html https://blog.yoshuawuyts.com/async-cancellation-1 (for “structured concurrency”)
Relevant code where we use spawn without canceling
- chain_muxer
- tipset_syncer
- https://github.com/ChainSafe/forest/blob/main/blockchain/consensus/fil_cns/src/validation.rs#L125
- https://github.com/ChainSafe/forest/blob/main/node/forest_libp2p/src/service.rs#L260
- state_manager
- https://github.com/ChainSafe/forest/blob/main/node/forest_libp2p/src/service.rs#L260
Could you add links to where we use spawn
without canceling or joining the task?
I've been testing in a standalone sample the futures::future::Abortable
type and it looks like it could solve our issue:
- When the
AbortHandle
is triggered this drops properly all the values moved to the future. - It's runtime independent (async-std, tokio) since it works with futures.
- Could be used to abort long running futures as well, like this one.
What we would need is some registry (could be part of the state_manager
) that can collect for each spawn future its associated handle.
When SIGINT is catched, we just call abort()
on all handles.
Note that we should move to tokio tasks and in this case using JoinSet
for handling proper task cancelation (as mentionned here).
Once done I think this issue can be repurposed.