parsec icon indicating copy to clipboard operation
parsec copied to clipboard

Remote completion callback should be an AM tag, not a a function pointer disguised as AM tag

Open devreal opened this issue 3 years ago • 4 comments
trafficstars

Original post: https://bitbucket.org/icldistcomp/parsec/pull-requests/217#comment-210854865

The semantic of the r_tag parameter to parsec_ce_put_fn_t and parsec_ce_get_fn_t needs clarification: since it is parsec_ce_tag_t it appears to be a previously registered AM tag. However, the revamp MPI backend treats it as a function pointer. The reason this works right now is that in the remote_deps implementation a function pointer is passed along with the request for data, which is then transferred back as part of the GET and eventually invoked where it came from (the remote that asked for the data). This way it is safe but taking a function’s address on one process and calling it on another is generally not guaranteed to work, thanks to ASLR.

The only safe way is to treat the tag as a tag, not a function pointer. We should probably store the tags in an array (instead of a hash table) for fast lookup, too.

devreal avatar Jun 10 '22 17:06 devreal

I fully agree regarding tag registration—that way the pointer doesn't need to go back and forth and implementation can be simplified (at least with LCI, eventually). That said, I don't think there's (currently) a correctness issue, at least due to ASLR, because the function pointer to be called is sent in the GET_DATA active message and then passed to the Put, which is called with the origin of the GET_DATA as the target. There is an issue to to type casting if there's a system where a function pointer can't be safely cast to a 64-bit integer and back, but that's not a problem anywhere we're running I believe.

omor1 avatar Jun 10 '22 18:06 omor1

Yes, the current implementation in PaRSEC's remote dependencies works precisely because the callback function pointer makes a round-trip. For external users, such as TTG, this is not the case. Since this is now part of the public API we should have the API correct.

devreal avatar Jun 10 '22 18:06 devreal

Regarding the array—that works if we can guarantee the user registers tags in the range of that array, but we have no way of enforcing that. In fact, I'd like a design where rather than the user deciding the AM tag to be registers, the user passes a pointer to a tag and the comm engine decided what tag to use—that way the comm engine can register internal tags (as the MPI one does) while ensuring there are no conflicts with user tags.

That is, rather than the current API, we have

int tag_register(parsec_ce_tag_t tag, parsec_ce_am_callback_t cb, void *cb_data, size_t msg_length);
int new_tag_register(parsec_ce_tag_t *tag, parsec_ce_am_callback_t cb, void *cb_data, size_t msg_length);

Edit: in fact, an earlier version of the MPI comm engine did use an array, until I pointed out the current issues with it.

omor1 avatar Jun 10 '22 18:06 omor1

For external users, such as TTG, this is not the case.

Yup, that would be a problem :). For my above suggestion, I'd add that the comm engine should be required to return the same tag as long as active messages are registered/unregistered in the same order across all processes.

omor1 avatar Jun 10 '22 18:06 omor1