async-rdma icon indicating copy to clipboard operation
async-rdma copied to clipboard

New Access Traits

Open Nugine opened this issue 2 years ago • 5 comments

https://zhuanlan.zhihu.com/p/524967763

Nugine avatar Jun 06 '22 10:06 Nugine

Hey, that's a good article and ingenious design. So the main idea is using type to limit the access and transfer the ownership of buf to ops and return it after completion to ensure safety? We define three access traits here because even LocalMr needs rkey when it is used to send to remote end as a RemoteMr. That might seem a little strange.

GTwhy avatar Jun 10 '22 06:06 GTwhy

So the main idea is using type to limit the access and transfer the ownership of buf to ops and return it after completion to ensure safety?

The value to be transfered is access value but not ownership value. If a value satisfies some access traits, it represents an access value.

We define three access traits here because even LocalMr needs rkey when it is used to send to remote end as a RemoteMr. That might seem a little strange.

We send a remote access value to the remote peer. A memory region can produce multiple read access values or single write access value. For example, the rdma agent keep a memory region alive and send a "remote readable token" to the remote. Then it is not writeable at the local until the agent confirms that the remote token is destroyed.

Nugine avatar Jun 10 '22 06:06 Nugine

I just realized that timeout from single side is unsound. We can not ensure that the remote access value is destroyed unless the remote side explicitly notifies the local side.

Nugine avatar Jun 10 '22 06:06 Nugine

I just realized that timeout from single side is unsound. We can not ensure that the remote access value is destroyed unless the remote side explicitly notifies the local side.

Yes, the timeout mechanism is unsound. When using je, a timeout LocalMr is a part of a RawMr which will not be deregistered immdiately. So the remote end can access the timeout remote mr if its timer is wrong. The software can not be aware of that and the NIC will not stop it.

GTwhy avatar Jun 10 '22 09:06 GTwhy