mercury Mercury no longer returns an error when RPC id does not match

I have a test (using Margo) that includes a test case for sending an RPC with an incorrect hg_id_t, not matching a registered RPC on the server. Mercury used to correctly return an error in such case (HG_NO_MATCH in the info->ret field of the callback argument), but with mercury 2.2.0 I'm not getting such an error anymore. The client is getting an HG_SUCCESS and perceives the RPC as having executed.

Note: the test in question is here: the client registers the RPC with a different name than on the server, then creates a handle and forwards it, and gets an HG_SUCCESS from the server.

Aug 22 '22 13:08 mdorier

I'm not sure, we have a test for that. You can see it if you run ctest -R rpc -VV. It does return HG_NOENTRY as expected.

Aug 22 '22 15:08 soumagne

Can you point me to the specific test source? The margo test is pretty simple, so I don't know what changed in Mercury 2.2.0 that made it no longer pass.

Aug 22 '22 16:08 mdorier

I may not be looking at the right place, I looked here: https://github.com/mercury-hpc/mercury/blob/master/Testing/test_rpc.c but it seems that the only test that is expecting a HG_NOENTRY is one where the sender hasn't registered the RPC id provided. The use-case I'm describing is different: the sender has registered it, but not the receiver.

Aug 22 '22 16:08 mdorier

That should be similar to that: https://github.com/mercury-hpc/mercury/blob/master/Testing/test_rpc.c#L795 ? I think we are talking about the same use case?

Aug 22 '22 16:08 soumagne

I don't see anything different between that test and yours ?

Aug 22 '22 16:08 soumagne

I'm confused, here shouldn't it be hg_ret != HG_NOENTRY rather than hg_ret != HG_SUCCESS ?

Aug 22 '22 18:08 mdorier

Ok there was a bug in my margo test, which is fixed now, and the test passes, so the problem is somewhere else. The margo test was meant to reproduce a problem I was seeing in Yokan (a problem that I still have), where when using mercury 2.2.0, forwarding to a wrong "provider id" (the 16 least significant bits of the RPC id) wasn't causing an HG_NO_MATCH/HG_NOENTRY to be returned. I still need to investigate why this happens, as I don't see the problem with mercury 2.1.0...

Aug 22 '22 18:08 mdorier

I'm confused, here shouldn't it be hg_ret != HG_NOENTRY rather than hg_ret != HG_SUCCESS ?

ok yes we could do that, I think it would be better than what we do currently, we just check for it here: https://github.com/mercury-hpc/mercury/blob/master/Testing/test_rpc.c#L290

Aug 22 '22 19:08 soumagne

I'm not entirely sure about the provider id error you see though, are you using __attribute__((__packed__)) on your header ? we haven't really made any changes to anything related to these return codes etc afaik except using packed attributes instead of pragma pack but I would not expect it to change anything...

Aug 22 '22 19:08 soumagne

I'm confused, here shouldn't it be hg_ret != HG_NOENTRY rather than hg_ret != HG_SUCCESS ?

ok yes we could do that, I think it would be better than what we do currently, we just check for it here: https://github.com/mercury-hpc/mercury/blob/master/Testing/test_rpc.c#L290

I understand now why at the line I mentioned it's hg_ret != HG_SUCCESS and not hg_ret != HG_NOENTRY; it tests for the return value of hg_test_rpc, not the callback's info->ret value. But regarding the line you quoted, there is a difference between expecting an HG_NOENTRY from HG_Create and from the callback in HG_Forward. In the former, the RPC hasn't been registered in the sender, and HG_Create will simply fail to create a handle. In the latter, the RPC hasn't been registered on the destination and the HG_NOENTRY is expected in info->ret in the callback passed to HG_Forward. In fact, I see here that HG_NOENTRY is explicitly ignored. So the behavior I'm describing is effectively not tested.

Aug 23 '22 11:08 mdorier

It is tested, it just relies on the message being printed, you can see it when running the test but yes I agree with you it'd be cleaner to not ignore the error code and instead compare against it, I'll push a cleanup fix for it later on.

Aug 23 '22 17:08 soumagne

I bisected commits from 2.1.0 to 2.2.0 to get to the one that makes the problem appear in my code, and it's this one: https://github.com/mercury-hpc/mercury/commit/6fed707d9f2584cb71bbbd8ac083eed9a4ff4408

I don't know exactly what it does. In my setup Mercury is built via spack with those variants: ~bmi~boostsys+checksum+debug~hwloc~ipo~mpi+ofi~psm~psm2+shared+sm~ucx~udreg.

I also tested with ~checksum and the problem is still there.

Margo registers its RPCs by computing a 8-byte hash of the string name, then shifts this hash left 2 bytes to make space for a short provider id (see here). The resulting id is then used to register the RPC using HG_Register.

My application registers an bunch of RPCs with a specific provider id (42). I then proceed to try forwarding an RPC to a provider id of 43. I know for a fact there is no matching handler on the server because all the registered RPC ids have their least 2 significant bytes representing 42, not 43. And yet, the forward function works fine and the result in its callback is HG_SUCCESS. Nothing gets called on the server.

Aug 25 '22 10:08 mdorier

Ah I figured it out! To reproduce the problem, I need to register an RPC with a NULL handler (as if I were a client), then forward the RPC to myself. This returns HG_SUCCESS. This doesn't seem to be the case when sending an RPC to another process, only when sending to self.

Aug 25 '22 11:08 mdorier

I did a reproducer using Margo here and there is an issue in Margo to track the problem here. To summarize:

Sending an RPC to a destination that has associated this RPC with a NULL handler has a different behavior depending on whether the receiver is self or not. If the receiver is self, the sender will get an HG_SUCCESS back, even though nothing was called. If the receiver is not self, the sender will get an HG_INVALID_ARG error. In both situations, one would expect an HG_NOENTRY.

The weird thing is that I can also reproduce the problem with Mercury 2.1.0 in Margo, but in another library I had no issue until Mercury 2.2.0.

Aug 25 '22 14:08 mdorier

I think having HG_INVALID_ARG being returned makes sense for that particular case no ? HG_NOENTRY means that it didn't find the RPC for that ID but if you have associated a NULL handler, that's another story imo, it means you have passed an invalid argument to your register routine. I'll look at the self send case.

Aug 26 '22 17:08 soumagne

Wrt the RPC ID issue that you have, which ID do you receive on the server end when you send ID=43 ? still 43 or some other random value ?

Aug 26 '22 17:08 soumagne

Wrt the RPC ID issue that you have, which ID do you receive on the server end when you send ID=43 ? still 43 or some other random value ?

In the cases that are failing, nothing gets called on the server. It's really not an RPC ID problem, the problem is getting back HG_SUCCESS when the RPC was registered with NULL on the server. I'm fine with HG_INVALID_ARG, at least it's an error code that's not HG_SUCCESS. However in Margo this means that we will have 2 different errors depending on whether the RPC is sent to self (in which case it because of the way margo works, it will end up sending to a NULL handler) or not, hence my preference of an HG_NOENTRY.

Aug 26 '22 21:08 mdorier

Cannot reproduce anymore, this must have been fixed in previous commits. Re-open if you still have issues on this topic.

Jun 06 '23 20:06 soumagne

mercury mercury copied to clipboard

Mercury no longer returns an error when RPC id does not match

mercury
mercury copied to clipboard