pycapnp icon indicating copy to clipboard operation
pycapnp copied to clipboard

A few questions/issues

Open fvannee opened this issue 5 years ago • 4 comments

First of all, thanks for the great library. This has been great to use so far. When I was checking it out I ran into several situations, in which I wonder if they're bugs or if it's my limited experience with this library. I thought I'd put them in one issue here, but I can turn them into different issues if desired.

The most important one was a case which I ran into where I was posting callbacks on an event loop in another thread. This just hang my whole client after a while. I tried creating a minimally reproducing example which I attached here.

Minimal example test_client.zip

It's just a very simple client/server. But the client will hang on the last request to:

# run something on the pycapnp thread
# now this one will hang
s.run_on_loop(run_test_on_loop(s.tif, 'test01'))

Probably there's something weird going on here with the threading, but I made very sure that everything capnp related is only done in one single thread. That's why it's posting callbacks on the event loop. So I'm a bit at a loss here what's causing this. When I terminate the client (when it hangs), this is the error:

2020-04-15 21:50:46,012 [client    ] [Thread-1    ] background thread started
2020-04-15 21:50:46,013 [asyncio   ] [Thread-1    ] Using selector: EpollSelector
2020-04-15 21:50:46,049 [client    ] [Thread-1    ] after login ( test = <external capability> )
2020-04-15 21:50:46,068 [client    ] [Thread-1    ] onEvent [<test_if_capnp:TestStruct reader (a = 1, b = ["test1", "ef", "test1"], c = 2, d = 3, e = vala)>]
2020-04-15 21:50:46,084 [client    ] [Thread-1    ] onEvent [<test_if_capnp:TestStruct reader (a = 1, b = ["test1", "ef", "test1"], c = 2, d = 3, e = vala)>, <test_if_capnp:TestStruct reader (a = 2, b = ["test2", "ef", "test2"], c = 2, d = 3, e = vala)>]
2020-04-15 21:50:46,528 [client    ] [Thread-1    ] onEvent [<test_if_capnp:TestStruct reader (a = 1, b = ["test1", "ef", "test1"], c = 2, d = 3, e = vala)>, <test_if_capnp:TestStruct reader (a = 2, b = ["test2", "ef", "test2"], c = 2, d = 3, e = vala)>, <test_if_capnp:TestStruct reader (a = 3, b = ["test00", "ef", "test00"], c = 2, d = 3, e = vala)>]
Traceback (most recent call last):
  File "/home/florisvannee/workspace/optibook/examples/test_client.py", line 120, in <module>
    s.run_on_loop(run_test_on_loop(s.tif, 'test01'))
  File "/home/florisvannee/workspace/optibook/examples/test_client.py", line 85, in run_on_loop
    ret = fut.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 434, in result
    self._condition.wait(timeout)
  File "/usr/local/lib/python3.8/threading.py", line 302, in wait
    waiter.acquire()
KeyboardInterrupt
terminate called after throwing an instance of 'kj::ExceptionImpl'
  what():  kj/async.c++:317: failed: expected head == nullptr; EventLoop destroyed with events still in the queue.  Memory leak?; head->trace() = kj::_::ChainPromiseNode
kj::_::TransformPromiseNode<kj::Promise<void>, kj::_::Void, capnp::TwoPartyVatNetwork::OutgoingMessageImpl::send()::{lambda()#1}, kj::_::PropagateException>
kj::_::EagerPromiseNode<kj::_::Void>
stack: 7f686fcf479f 7f686fb27cf5 44844d 4634c4 473524 504478 519bfb 42c46c 7f687a740504 42b263

Second, some small issues/questions:

  • In callbacks, the parameters that are passed only have the lifetime of that method. That's why, in the example I provided in TestFeed, I do a copy.deepcopy before I store them in a list. However, is copy.deepcopy supposed to work all the time? I've had cases where doing the deepcopy would throw, or where the deep copy would succeed, but accessing the object later would throw. Is there any way to just turn the capnp object into a pure Python object for further processing / storing? Unfortunately I don't have a small reproducible example of this. If I can get one I'll post it.
  • In callbacks (like the one in TestFeed), if something throws, the error is not logged in the client (funnily enough, it is sent to the server which reports it). Ideally, I'd want exceptions in the callbacks to just throw (or at least log) exceptions in the client. Is this possible?
  • Callbacks (or any server implementation methods) are by default synchronous. It's not possible to make them async. I think this would be a nice feature to have. Eg. in test_server.py I now have a asyncio.tasks.ensure_future call which would be able with an await if it was an async method itself.

Thanks for the effort put into making this library what it is! :-)

fvannee avatar Apr 15 '20 20:04 fvannee

I gave this a try with the gdb debugger but can't figure out what's going wrong. It looks like it's reaching the capnp code where it sends the last message in:

RemotePromise<DynamicStruct> Request<DynamicStruct, DynamicStruct>::send()

But somehow this still never gets properly sent, because the server never receives this message. I'm a bit at a loss unfortunately.

fvannee avatar Apr 16 '20 21:04 fvannee

I'm not the original author, but I have seen some discussions around related to both pycapnp and capnp that indicate deepcopy (and copying objects) may not behave as you'd expect.

I'm conjecturing, but there's likely a bunch of state included with the datastructures. So if you want to keep it around you'll want to copy the data itself rather than the object. There might be an easier way to do this though.

haata avatar Apr 17 '20 22:04 haata

Thanks! I believe I can answer some of these questions myself now:

  • While deepcopy indeed doesn't work very well, there is actually a to_dict method on the reader objects that convert them to plain Python dictionaries. However, the implementation has a bug that leads to an infinite recursive call. I'll create a PR that fixes this.
  • The hanging when doing the callback on a different thread, was because the result of the call was still an internal library object and this result was passed back to the calling thread. This is something tha the library cannot handle as somehow copying/destruction or so is not handled well in this case. This leads to a promise never resolving. It's fixed by not passing the capnp result, but instead the raw dictionary obtained from the to_dict call.

fvannee avatar Apr 23 '20 20:04 fvannee

Please have a look at https://github.com/capnproto/pycapnp/pull/213 if you time

fvannee avatar Apr 24 '20 17:04 fvannee