ros_comm
ros_comm copied to clipboard
`rosgraph.is_master_alive()` has unpredicable delay with certain URIs
Issue
I've been having issues getting a remote node to detect when master is down (detailed here) which I've pinpointed down to the rosgraph.is_master_alive()
call I'm using to check the connection. In short, certain IPs cause the method to take an awful long time compared to others.
Code
Below is the simplest example I've been able to create to reproduce the issue. Master is running at 127.0.1.1.
if __name__ == "__main__":
rospy.init_node("test_node")
uri = "http://192.168.0.1:11311"
status = "Online!" if rosgraph.is_master_online(uri) else "Offline!"
print(status)
Which yields a result of
rosrun test_package test.py
(2m 10s later...)
Offline!
uri = "http://127.0.0.1:11311"
yields an instantaneous result of Online!
uri = "http://1.2.3.4:11311"
yields an instantaneous result of Offline!
After playing around with different IPs it seems that the issue occurs with IPs of the form 192.168.x.x
, though I'm unsure if that range of IPs is specific to my machine.
Potential Solution
The workaround I'm currently using is to wrap the method call in a timeout.
@contextmanager
def timeout(time):
signal.signal(signal.SIGALRM, raise_timeout)
signal.alarm(time)
try:
yield
except TimeoutError:
pass
finally:
signal.signal(signal.SIGALRM, signal.SIG_IGN)
def raise_timeout(signum, frame):
raise TimeoutError
def is_master_alive(uri, time):
with timeout(time):
return rosgraph.is_master_online(uri)
if __name__ == "__main__":
rospy.init_node("test_node")
uri = "http://192.168.0.1:11311"
status = "Online!" if is_master_alive(uri, time=2) else "Offline!"
print(status)
This feels like a lot of unnecessary code, so I think it'd be worthwhile to add an optional timeout parameter to rosgraph.is_master_alive()
. The fact that the method returns at all suggests to me that there's some level of timeout already happening, but I think it'd be nice to allow it to be user-defined.
I'm happy to raise a PR if someone could point me in the right direction.
After adding some prints, I've managed to narrow it down to this line
rosrun test_package test.py
Building Master object
Building master uri
Calling Master._reinit()
Parsing http host and port
Creating ServerProxy object
Created ServerProcy object
Making caller id
Built Master object
Getting PID