zenoh-python
zenoh-python copied to clipboard
[Bug] Session.get handler only calls done callback in v0.11.0-rc.2
Describe the bug
Session.get behaves differently in 0.11.0-rc.2 compared to 0.10.1rc0. When using the Closure reply handler (reply and done callbacks), the reply callback is not called.
To reproduce
Considering the following example:
import time
import zenoh
session1 = zenoh.open()
session2 = zenoh.open()
qable = session1.declare_queryable('test', lambda req: req.reply(zenoh.Sample('test', 'ok')))
session2.get('test',
(lambda reply: print(f'reply "{reply.ok.payload.decode()}"'),
lambda: print('done')))
time.sleep(1)
qable.undeclare()
session1.close()
session2.close()
With version 0.10.1rc0 the console output is as expected:
reply "ok"
done
With version 0.11.0-rc.2 only the done callback is called:
done
System info
- Platform Win11
- Python version 3.10.9
- eclipse-zenoh version 0.11.0-rc.2
~~Could this be related to https://github.com/eclipse-zenoh/zenoh/pull/816?~~
@fsteff the PR you linked is not merge into main yet os it can't be related. @wyfo can you look into it?
I can reproduce the issue, I'm on it.
It seems that Session.declare_queryable
changed in 0.11; the operation is no more taken in account instantly. In fact, adding time.sleep(1)
after session1.declare_queryable(...)
fix the issue.
@Mallets Do you remind about declare_queryable
behavior modification?
I can confirm that a short sleep resolves the issue, also in more complex test cases. Actually even time.sleep(.000000001)
is enough.
Sounds like some sort of concurrency problem (OS context switch is enough?).
Looks like this won't be a problem in most cases. But it breaks most of our unit tests...
Hello @fsteff!
The example you provided has never been guaranteed to work. It always has been a race condition between the propagation of the queryable declaration to session2
through the net stack on one side and the execution of the main thread on the second side. In 0.10.0-rc the scheduling was so that it almost always worked in python. In 0.11.0-rcX we ported zenoh from async-std
to tokio
which greatly impacted the scheduling.
@OlivierHecart I suspected as much. Already fixed most unit tests and upgraded our stack :) It was convenient that this "just worked" - is there a good way to verify that queryables have been propagated? Calling sleep(...) works most of the time, but that's not really a clean solution. The only method I know of utilizes a router's admin space, but out tests work purely p2p.
Arguably a new entity, let's say it a querier
, should be introduce to serve a similar to a publisher
. In that way, it may become easier to know when there is at least one matching queryable. But my comment applies more generic to Zenoh as whole and not only to the Python API.