rally
rally copied to clipboard
Upgrade thespian to fix hang on macOS
Closes #1575
See https://github.com/thespianpy/Thespian/issues/70 for more context.
FWIW, I tested with "thespian==3.10.6" and I can't reproduce the thespian.actors.InvalidActorAddress exception.
Thanks! Interesting. If it's not too much work, would you mind trying again with thespian debug logs activated and share the output?
Thanks! Interesting. If it's not too much work, would you mind trying again with thespian debug logs activated and share the output?
Sure!
I think the key line is here, but I haven't dug into why my system is no longer meeting the capabilities requirements. In addition to testing with 3.10.6 I took a look through the Release Notes to see if there were any changes to the constructor that we hadn't adopted, but the fact that I can only repro this on 3.10.7 leads to to believe that Fix Issue #70: setting TCP_NODELAY on MacOS sometimes fails is probably to blame.
$ tail -f actor-system-internal.log
2023-01-17 17:24:23.715134 p11971 dbg ++++ Starting Admin from /Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/__init__.py
2023-01-17 17:24:23.715477 p11971 I ++++ Admin started @ ActorAddr-(T|:1900) / gen (3, 10)
2023-01-17 17:24:28.757419 p12025 dbg ++++ Starting Admin from /Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/__init__.py
2023-01-17 17:24:28.761218 p12025 I ++++ Admin started @ ActorAddr-Q.ThespianQ / gen (3, 10)
2023-01-17 17:24:28.770882 p12025 dbg Admin of ReceiveEnvelope(from: ActorAddr-Q.ThespianQ.a, <class 'thespian.system.messages.multiproc.LoggerConnected'> msg: <thespian.system.messages.multiproc.LoggerConnected object at 0x106bd5dc0>)
2023-01-17 17:24:28.771869 p12025 dbg actualTransmit of TransportIntent(ActorAddr-Q.~-pending-ExpiresIn_0:04:59.999780-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x106bd5f10>-quit_0:04:59.999757)
2023-01-17 17:24:28.774415 p11963 dbg actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999684-<class 'thespian.system.messages.admin.QueryExists'>-<thespian.system.messages.admin.QueryExists object at 0x106bcaf70>-quit_0:04:59.999662)
2023-01-17 17:24:28.775267 p12025 dbg Admin of ReceiveEnvelope(from: ActorAddr-Q.~, <class 'thespian.system.messages.admin.QueryExists'> msg: <thespian.system.messages.admin.QueryExists object at 0x106be6040>)
2023-01-17 17:24:28.775951 p12025 dbg Attempting intent TransportIntent(ActorAddr-Q.~-pending-ExpiresIn_0:04:59.999735-<class 'thespian.system.messages.admin.QueryAck'>-<thespian.system.messages.admin.QueryAck object at 0x106bd5d60>-quit_0:04:59.999715)
2023-01-17 17:24:28.776373 p12025 dbg actualTransmit of TransportIntent(ActorAddr-Q.~-pending-ExpiresIn_0:04:59.999303-<class 'thespian.system.messages.admin.QueryAck'>-<thespian.system.messages.admin.QueryAck object at 0x106bd5d60>-quit_0:04:59.999289)
2023-01-17 17:24:28.779637 p11963 dbg actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999913-<class 'thespian.system.messages.admin.PendingActor'>-PendingActor#1_of_None-quit_0:04:59.999898)
2023-01-17 17:24:28.780059 p12025 dbg Admin of ReceiveEnvelope(from: ActorAddr-Q.~1, <class 'thespian.system.messages.admin.PendingActor'> msg: PendingActor#1_of_None)
2023-01-17 17:24:28.780399 p12025 I Pending Actor request received for esrally.racecontrol.BenchmarkActor reqs {'coordinator': True} from ActorAddr-Q.~1
2023-01-17 17:24:28.781272 p12025 dbg actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ.a-pending-ExpiresIn_0:04:59.999898-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 10, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/actor.py, 122, "Checking capabilities [{'Thespian ActorSystem N...-quit_0:04:59.999882)
- 2023-01-17 17:24:28.782002 p12025 Warn no system has compatible capabilities for Actor esrally.racecontrol.BenchmarkActor
2023-01-17 17:24:28.782706 p12025 dbg Attempting intent TransportIntent(ActorAddr-Q.~1-pending-ExpiresIn_0:04:59.999777-<class 'thespian.system.messages.admin.PendingActorResponse'>-PendingActorResponse(for ActorAddr-Q.~1 inst# 1) errCode 3586 actual None-quit_0:04:59.999756)
2023-01-17 17:24:28.783091 p12025 dbg actualTransmit of TransportIntent(ActorAddr-Q.~1-pending-ExpiresIn_0:04:59.999380-<class 'thespian.system.messages.admin.PendingActorResponse'>-PendingActorResponse(for ActorAddr-Q.~1 inst# 1) errCode 3586 actual None-quit_0:04:59.999366)
2023-01-17 17:24:31.786774 p11963 I ActorSystem shutdown requested.
Also worth adding a stripped down repro using esrallyd:
$ esrallyd start --coordinator-ip 192.168.20.6 --node-ip 192.168.20.6
zsh: /usr/local/bin/esrallyd: bad interpreter: /usr/local/opt/python/bin/python3.7: no such file or directory
multiprocTCPBase {'coordinator': True, 'ip': '192.168.20.6', 'Convention Address.IPv4': '192.168.20.6:1900'}
Traceback (most recent call last):
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/bin/esrallyd", line 8, in <module>
sys.exit(main())
File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/rallyd.py", line 107, in main
start(args)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/rallyd.py", line 39, in start
actor.bootstrap_actor_system(local_ip=args.node_ip, coordinator_ip=args.coordinator_ip)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/actor.py", line 264, in bootstrap_actor_system
return thespian.actors.ActorSystem(system_base, logDefs=log.load_configuration(), capabilities=capabilities)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 637, in __init__
systemBase = self._startupActorSys(
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 678, in _startupActorSys
systemBase = sbc(self, logDefs=logDefs)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocTCPBase.py", line 28, in __init__
super(ActorSystemBase, self).__init__(system, logDefs)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocCommon.py", line 83, in __init__
super(multiprocessCommon, self).__init__(system, logDefs)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/systemBase.py", line 326, in __init__
self._startAdmin(self.adminAddr,
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocCommon.py", line 115, in _startAdmin
raise InvalidActorAddress(adminAddr,
thespian.actors.InvalidActorAddress: ActorAddr-(T|:1900) is not a valid ActorSystem admin
Thanks a lot! The only information left I need to report the bug to Thespian would be the same logs but on thespian 3.10.6.
Also, why specifically 192.168.20.6? Is it you local IP address?
I tried it against a non-loopback interface and got the same error. I wonder if @inqueue has the same issue?
I see the same issue and I also saw it with a patched 3.10.6 during testing. Unfortunately, I actually validated the changes from https://github.com/thespianpy/Thespian/tree/issue70 by patching the version currently in use by Rally (3.10.1). The error has been referenced at https://github.com/thespianpy/Thespian/issues/70#issuecomment-1385832797.
For what it's worth I'm seeing Warn no system has compatible capabilities for Actor esrally.racecontrol.BenchmarkActor on Linux too. Comparing the logs between Thespian 3.10.6 and 3.10.7 would help a lot.
3.10.6
Patch:
diff --git a/pyproject.toml b/pyproject.toml
index 77469e44fceb228f02c9407bbc1f0709c351c1bf..4aee4ecf020899ce2dd0c0e02e2aa45d6674a706 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -58,7 +58,7 @@ dependencies = [
"Jinja2==2.11.3",
"markupsafe==2.0.1",
# License: MIT
- "thespian==3.10.7",
+ "thespian==3.10.6",
# recommended library for thespian to identify actors more easily with `ps`
# "setproctitle==1.1.10",
# always use the latest version, these are certificate files...
Invocation:
$ make install
[...]
$ export THESPLOG_FILE="${THESPLOG_FILE:-${HOME}/.rally/logs/actor-system-internal.log}"
$ export THESPLOG_FILE_MAXSIZE=${THESPLOG_FILE_MAXSIZE:-204800}
$ export THESPLOG_THRESHOLD="DEBUG"
$ esrallyd start --coordinator-ip 192.168.20.6 --node-ip 192.168.20.6
[INFO] Successfully started actor system on node [192.168.20.6] with coordinator node IP [192.168.20.6].
Actor logs:
2023-01-19 14:18:05.685305 p6863 dbg ++++ Starting Admin from /Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/__init__.py
2023-01-19 14:18:05.685794 p6863 I ++++ Admin started @ ActorAddr-(T|:1900) / gen (3, 10)
2023-01-19 14:18:05.691125 p6863 dbg Admin of ReceiveEnvelope(from: ActorAddr-(T|:64880), <class 'thespian.system.messages.multiproc.LoggerConnected'> msg: <thespian.system.messages.multiproc.LoggerConnected object at 0x1026c5610>)
2023-01-19 14:18:05.691638 p6863 dbg actualTransmit of TransportIntent(ActorAddr-(T|:64879)-pending-ExpiresIn_0:04:59.999885-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x1026c5a00>-quit_0:04:59.999870)
2023-01-19 14:18:05.693301 p6844 dbg actualTransmit of TransportIntent(ActorAddr-(T|:1900)-pending-ExpiresIn_0:04:59.999882-<class 'thespian.system.messages.admin.QueryExists'>-<thespian.system.messages.admin.QueryExists object at 0x1026a4c40>-quit_0:04:59.999870)
2023-01-19 14:18:05.693816 p6863 dbg Admin of ReceiveEnvelope(from: ActorAddr-(T|:64879), <class 'thespian.system.messages.admin.QueryExists'> msg: <thespian.system.messages.admin.QueryExists object at 0x1026c5eb0>)
2023-01-19 14:18:05.694198 p6863 dbg Attempting intent TransportIntent(ActorAddr-(T|:64879)-pending-ExpiresIn_0:04:59.999883-<class 'thespian.system.messages.admin.QueryAck'>-<thespian.system.messages.admin.QueryAck object at 0x1026c5040>-quit_0:04:59.999874)
2023-01-19 14:18:05.694414 p6863 dbg actualTransmit of TransportIntent(ActorAddr-(T|:64879)-pending-ExpiresIn_0:04:59.999663-<class 'thespian.system.messages.admin.QueryAck'>-<thespian.system.messages.admin.QueryAck object at 0x1026c5040>-quit_0:04:59.999656)
Rally logs:
2023-01-19 03:48:05,472 -not-actor-/PID:6844 esrally.utils.process DEBUG Running subprocess [git -C /Users/bradleydeam/perf/github.com/b-deam/rally/esrally --version] with logging.
2023-01-19 03:48:05,483 -not-actor-/PID:6844 esrally.utils.process DEBUG git version 2.37.1 (Apple Git-137.1)
2023-01-19 03:48:05,483 -not-actor-/PID:6844 esrally.utils.process DEBUG Subprocess [git -C /Users/bradleydeam/perf/github.com/b-deam/rally/esrally --version] finished with return code [0].
2023-01-19 03:48:05,484 -not-actor-/PID:6844 esrally.utils.process DEBUG Running subprocess [git -C /Users/bradleydeam/perf/github.com/b-deam/rally/esrally rev-parse HEAD] with output.
2023-01-19 03:48:05,499 -not-actor-/PID:6844 esrally.actor INFO Starting actor system with system base [multiprocTCPBase] and capabilities [{'coordinator': True, 'ip': '192.168.20.6', 'Convention Address.IPv4': '192.168.20.6:1900'}].
2023-01-19 03:48:05,686 -not-actor-/PID:6863 root INFO ++++ Actor System gen (3, 10) started, admin @ ActorAddr-(T|:1900)
2023-01-19 03:48:05,686 -not-actor-/PID:6863 root DEBUG Thespian source: /Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/__init__.py
3.10.7
Invocation:
$ export THESPLOG_FILE="${THESPLOG_FILE:-${HOME}/.rally/logs/actor-system-internal.log}"
$ export THESPLOG_FILE_MAXSIZE=${THESPLOG_FILE_MAXSIZE:-204800}
$ export THESPLOG_THRESHOLD="DEBUG"
$ esrallyd start --coordinator-ip 192.168.20.6 --node-ip 192.168.20.6
Traceback (most recent call last):
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/bin/esrallyd", line 8, in <module>
sys.exit(main())
File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/rallyd.py", line 107, in main
start(args)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/rallyd.py", line 39, in start
actor.bootstrap_actor_system(local_ip=args.node_ip, coordinator_ip=args.coordinator_ip)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/actor.py", line 263, in bootstrap_actor_system
return thespian.actors.ActorSystem(system_base, logDefs=log.load_configuration(), capabilities=capabilities)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 637, in __init__
systemBase = self._startupActorSys(
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 678, in _startupActorSys
systemBase = sbc(self, logDefs=logDefs)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocTCPBase.py", line 28, in __init__
super(ActorSystemBase, self).__init__(system, logDefs)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocCommon.py", line 83, in __init__
super(multiprocessCommon, self).__init__(system, logDefs)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/systemBase.py", line 326, in __init__
self._startAdmin(self.adminAddr,
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocCommon.py", line 115, in _startAdmin
raise InvalidActorAddress(adminAddr,
thespian.actors.InvalidActorAddress: ActorAddr-(T|:1900) is not a valid ActorSystem admin
Actor logs:
2023-01-19 14:10:37.967074 p91216 dbg ++++ Starting Admin from /Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/__init__.py
2023-01-19 14:10:37.967405 p91216 I ++++ Admin started @ ActorAddr-(T|:1900) / gen (3, 10)
Rally logs:
2023-01-19 03:40:37,917 -not-actor-/PID:91212 esrally.utils.process DEBUG Running subprocess [git -C /Users/bradleydeam/perf/github.com/b-deam/rally/esrally --version] with logging.
2023-01-19 03:40:37,932 -not-actor-/PID:91212 esrally.utils.process DEBUG git version 2.37.1 (Apple Git-137.1)
2023-01-19 03:40:37,932 -not-actor-/PID:91212 esrally.utils.process DEBUG Subprocess [git -C /Users/bradleydeam/perf/github.com/b-deam/rally/esrally --version] finished with return code [0].
2023-01-19 03:40:37,932 -not-actor-/PID:91212 esrally.utils.process DEBUG Running subprocess [git -C /Users/bradleydeam/perf/github.com/b-deam/rally/esrally rev-parse HEAD] with output.
2023-01-19 03:40:37,951 -not-actor-/PID:91212 esrally.actor INFO Starting actor system with system base [multiprocTCPBase] and capabilities [{'coordinator': True, 'ip': '192.168.20.6', 'Convention Address.IPv4': '192.168.20.6:1900'}].
2023-01-19 03:40:37,967 -not-actor-/PID:91216 root INFO ++++ Actor System gen (3, 10) started, admin @ ActorAddr-(T|:1900)
2023-01-19 03:40:37,967 -not-actor-/PID:91216 root DEBUG Thespian source: /Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/__init__.py
2023-01-19 03:40:42,963 -not-actor-/PID:91212 esrally.actor ERROR Could not initialize internal actor system.
Traceback (most recent call last):
File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/actor.py", line 263, in bootstrap_actor_system
return thespian.actors.ActorSystem(system_base, logDefs=log.load_configuration(), capabilities=capabilities)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 637, in __init__
systemBase = self._startupActorSys(
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 678, in _startupActorSys
systemBase = sbc(self, logDefs=logDefs)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocTCPBase.py", line 28, in __init__
super(ActorSystemBase, self).__init__(system, logDefs)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocCommon.py", line 83, in __init__
super(multiprocessCommon, self).__init__(system, logDefs)
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/systemBase.py", line 326, in __init__
self._startAdmin(self.adminAddr,
File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocCommon.py", line 115, in _startAdmin
raise InvalidActorAddress(adminAddr,
thespian.actors.InvalidActorAddress: ActorAddr-(T|:1900) is not a valid ActorSystem admin