ros_comm icon indicating copy to clipboard operation
ros_comm copied to clipboard

Fix for #152 add a an optional timeout on service calls

Open alainsanguinetti opened this issue 3 years ago • 9 comments

A timeout global is defined for XmlRpc connections. Overrides are created to allow users to specify a timeout for specific calls. In particular for ros::master::check a new version is created to allow users to check if the master is up without forever blocking their app. This fix preserves current functionality but allows user to switch on a global timeout by setting it in the XmlRpcClient class.

alainsanguinetti avatar Mar 17 '21 10:03 alainsanguinetti

this makes sense to me, poll(2) system call does not wait forever for the events via XmlRpcClient request. but this seems to be unlikely to merge into mainline because of breaking API/ABI, if i am not mistaken.

I think it only breaks ABI. But I don't know if ABI is a big concern in this library since it doesn't look like it was designed with this concern in mind, for example no use of d-pointer.

alainsanguinetti avatar Mar 18 '21 09:03 alainsanguinetti

right, only ABI breaks. to support the product, ABI compatibility is really important. i am not sure if the mainline takes this fix with breaking it. i would not do that though. but thanks for bringing this up, at least fix makes sense to me.

fujitatomoya avatar Mar 18 '21 09:03 fujitatomoya

@jacobperron how does the review and integration process goes?

alainsanguinetti avatar Apr 22 '21 13:04 alainsanguinetti

Tell me if I am wrong, but I don't see any ABI-breaking changes in this PR.

This PR only adds functions, methods and some static members, and do not changes any existing function, nor adds, removes or reorders class members or virtual methods.

Hugal31 avatar Apr 22 '21 13:04 Hugal31

This PR only adds functions, methods and some static members

thanks for pointing that out, i think that is correct. this does not break ABI.

fujitatomoya avatar Apr 22 '21 22:04 fujitatomoya

Could you provide a reproduction of the original issue that this is supposed to fix? Ideally, as a test case.

Hello @jacobperron the case is the following : client and master are on two different machines, if suddenly the wifi of the master disconnects, the service calls of the client can hang because execute call uses -1 as the timeout so it will not return until the socket sends an event and it will only happen after a quite long duration. The proposition of this PR is to actually use the timeout feature intended with a global timeout value that the user can set. Is it possible to add it as test case somewhere ? It can be reproduced with two docker instances I think

alainsanguinetti avatar Jun 15 '21 19:06 alainsanguinetti

I was hoping that we could write a unit test involving a single host. I think it might be overkill to introduce docker as a test dependency. I'll at least try to reproduce the setup locally with docker and report back here.

jacobperron avatar Jun 15 '21 19:06 jacobperron

ping. How is this progressing?

amilcarlucas avatar Jun 01 '22 10:06 amilcarlucas

This problem persisted since 2013!

It was first posted on #152 then the fix was continued on #738, #1165, #1271, #1282. Unfortunately, it was never got merged. I'd like to lend a hand if needed.

If not now, then this may never be patched forever.

@jacobperron @fujitatomoya @sloretz @mjcarroll Please reconsider checking this PR.

5730289021-NN avatar Dec 08 '22 01:12 5730289021-NN