async-ucx icon indicating copy to clipboard operation
async-ucx copied to clipboard

Cancellation safety

Open Nugine opened this issue 4 years ago • 6 comments

Worker::tag_recv accepts &mut [MaybeUninit<u8>] as a data buffer.

https://github.com/madsys-dev/async-ucx/blob/d38238742b714c0a1ed4ae571d0a8175883eb226/src/ucp/endpoint/tag.rs#L5-L8

However, a future can be cancelled(dropped) at any time. While the IO operation is still in progress, the data buffer may be used after free. It's an unsoundness issue.

Nugine avatar Aug 01 '21 04:08 Nugine

Yes, this issue needs to be fixed in the future.

One possible way is to call ucp_request_cancel in the drop function.

wangrunji0408 avatar Aug 02 '21 04:08 wangrunji0408

The data buffer is invalid once the drop function is called. Any IO operation should stop using the buffer immediatiley. Does ucp_request_cancel have such an effect?

Nugine avatar Aug 02 '21 05:08 Nugine

I think it does. But I can't find proof in the UCX docs.

wangrunji0408 avatar Aug 02 '21 05:08 wangrunji0408

This line requires a mutable exclusive access. But the completion callback may be called in another thread at any time. Is there any guarantee for the &mut Request?

https://github.com/madsys-dev/async-ucx/blob/d38238742b714c0a1ed4ae571d0a8175883eb226/src/ucp/endpoint/mod.rs#L131

Nugine avatar Aug 02 '21 05:08 Nugine

Oh in fact an immutable reference is enough, because the AtomicWaker guarantees thread safety.

And it is also safe to get mutable reference here (but don't pass it out). Because both Worker and Endpoint are !Send, IO functions and their callbacks are running on the same thread.

wangrunji0408 avatar Aug 02 '21 06:08 wangrunji0408

Yes, this issue needs to be fixed in the future.

One possible way is to call ucp_request_cancel in the drop function.

ucx_request_cancel can't guarantee cancellation safety.

  • ucx_request_cancel doesn't cancel send requests. https://github.com/openucx/ucx/issues/1162
  • If tag recv request is posted to transport layer, ucp_request_cancel need wait request completion. Buffer may still be used during cancellation.

yiyuanliu avatar Sep 13 '21 12:09 yiyuanliu