clay
clay copied to clipboard
"Cannot start hyperg" while it's already running
Description
Golem Version: f5a985ec7d8456e80448b35fe9c31840136d329c
OS: Linux
Branch: b0.23
Reproducible: sometimes
Description of the issue:
When starting golem with hyperg already running it usually correctly detects it, but sometimes fails with
2020-03-21 04:13:33 CRITICAL golem.client Can't start network. Giving up.
Traceback (most recent call last):
File "/home/buildbot-worker/worker/test_node_integration/build/golem/client.py", line 373, in start
self.start_network()
File "/home/buildbot-worker/worker/test_node_integration/build/golem/client.py", line 480, in start_network
self.daemon_manager.start()
File "/home/buildbot-worker/worker/test_node_integration/build/golem/network/hyperdrive/daemon_manager.py", line 116, in start
return self._start()
File "/home/buildbot-worker/worker/test_node_integration/build/golem/report.py", line 173, in wrapper
return func(*args, **kwargs)
File "/home/buildbot-worker/worker/test_node_integration/build/golem/network/hyperdrive/daemon_manager.py", line 138, in _start
raise RuntimeError("Cannot start {}".format(self._executable))
RuntimeError: Cannot start hyperg
Actual result:
Golem fails to start.
Steps To Reproduce
- Start hyperg
- Start golem
Expected behavior
Golem should always detect running hyperg.
Logs and any additional context
https://buildbot.golem.network/buildbot/#builders/15/builds/979 (test test_task_timeout) https://buildbot.golem.network/buildbot/#/builders/15/builds/981 (test test_frame_restart)
Hypothesis: Before starting hyperg, golem tries to connect to potentially existing one. This might be undefined behaviored by twisted, if called from thread.
AFAIR the node_integration_tests
are responsible for starting their own hyperg
i think it can be related to:
- not properly closing hyperg after test1, making test2 fail ( zombie-g )
- race when starting the hyperg on the same machine from multiple nodes at the same time