piker
piker copied to clipboard
`tractor` bug: serialzation error isn't propagated?!
Guilty line of code when using msgpack in tractor.
Super odd, and pretty sure @iamzoltan found this before but, basically a serialization error due to using msgpack (instead of msgspec) and sending a set causes a crash in tractor.Context.started() but this error isn't propagated upward to the calling actor (emsd in this case).
Instead we get the msg error but the contents of that message aren't an error and instead contain a return?
Dec 14 11:39:42 (emsd, 8459, _emsd_main)) [PDB] tractor._debug _debug.py:529
Attaching to pdb in crashed actor: ('emsd', 'e371c0eb-bf2f-4e79-942f-a5ed5a5eff9a')
Traceback (most recent call last):
File "/home/goodboy/repos/tractor/tractor/trionics/_mngrs.py", line 172, in maybe_open_context
value = cache.values[key]
KeyError: ('binance', 'ethusdt')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/goodboy/repos/tractor/tractor/_actor.py", line 186, in _invoke
await chan.send({'return': await coro, 'cid': cid})
File "/home/goodboy/repos/piker/piker/clearing/_ems.py", line 1023, in _emsd_main
async with (
File "/usr/lib/python3.9/contextlib.py", line 181, in __aenter__
return await self.gen.__anext__()
File "/home/goodboy/repos/piker/piker/clearing/_ems.py", line 339, in maybe_open_brokerd_trades_dialogue
relay = await self.nursery.start(
File "/home/goodboy/repos/piker/39/lib/python3.9/site-packages/trio/_core/_run.py", line 1041, in start
task._eventual_parent_nursery = self
File "/home/goodboy/repos/piker/39/lib/python3.9/site-packages/trio/_core/_run.py", line 815, in __aexit__
raise combined_error_from_nursery
File "/home/goodboy/repos/piker/piker/clearing/_ems.py", line 430, in open_brokerd_trades_dialogue
async with (
File "/usr/lib/python3.9/contextlib.py", line 181, in __aenter__
return await self.gen.__anext__()
File "/home/goodboy/repos/piker/piker/clearing/_paper_engine.py", line 519, in open_paperboi
async with portal.open_context(
File "/home/goodboy/repos/piker/39/lib/python3.9/site-packages/async_generator/_util.py", line 34, in __aenter__
return await self._agen.asend(None)
File "/home/goodboy/repos/tractor/tractor/_portal.py", line 428, in open_context
first = msg['started']
KeyError: 'started'
And here is the msg contents:
(Pdb++) msg
{'return': None, 'cid': '19c78c86-4ded-4de0-9cb5-9905ffd74325'}
Not sure what the heck is going on but simple tests introduced to tractor don't seem to expose this problem which makes me think it's something to do with a race in ContextCancelled being raised in the target task and/or the wrong error being capture and set as Context._errror? it's that or some kinda wacky task-context id mixup is happening.
Needs some deep debugging inside _invoke() likely...
Boom got it.
It's because we call await ctx.cancel() inside a finally:...from the containing open_feed(): block...
Removing this makes the serialization error propagate.
Need to make a test for this in tractor..
Still no idea how to reproduce with unit test 😂
It seems to be some kind of race where the .started() call errors but the error is rewrapped and discarded during cancellation processing triggered by the calling actor? I'm lost how this is possible atm..