async-rdma
async-rdma copied to clipboard
New Access Traits
https://zhuanlan.zhihu.com/p/524967763
Hey, that's a good article and ingenious design.
So the main idea is using type to limit the access and transfer the ownership of buf to ops and return it after completion to ensure safety?
We define three access traits here because even LocalMr
needs rkey
when it is used to send to remote end as a RemoteMr
. That might seem a little strange.
So the main idea is using type to limit the access and transfer the ownership of buf to ops and return it after completion to ensure safety?
The value to be transfered is access value but not ownership value. If a value satisfies some access traits, it represents an access value.
We define three access traits here because even LocalMr needs rkey when it is used to send to remote end as a RemoteMr. That might seem a little strange.
We send a remote access value to the remote peer. A memory region can produce multiple read access values or single write access value. For example, the rdma agent keep a memory region alive and send a "remote readable token" to the remote. Then it is not writeable at the local until the agent confirms that the remote token is destroyed.
I just realized that timeout from single side is unsound. We can not ensure that the remote access value is destroyed unless the remote side explicitly notifies the local side.
I just realized that timeout from single side is unsound. We can not ensure that the remote access value is destroyed unless the remote side explicitly notifies the local side.
Yes, the timeout mechanism is unsound.
When using je, a timeout LocalMr
is a part of a RawMr
which will not be deregistered immdiately.
So the remote end can access the timeout remote mr if its timer is wrong. The software can not be aware of that and the NIC will not stop it.