zenoh-python icon indicating copy to clipboard operation
zenoh-python copied to clipboard

Asyncio support

Open ipsod opened this issue 1 year ago • 7 comments

Describe the bug

Hi.

I was trying to figure out how to use Zenoh with asyncio, and the only mention I could find was at the bottom of this page: https://github.com/eclipse-zenoh/zenoh-python/tree/master/examples

To reproduce

n/a

System info

n/a

ipsod avatar Mar 11 '23 17:03 ipsod

appears to have happened in this commit: https://github.com/eclipse-zenoh/zenoh-python/commit/280385977323c411bb6f80358dadae83413237d9

last versions can be found here: https://github.com/eclipse-zenoh/zenoh-python/tree/8cdb7ee57584a11f43a87f88d5e5e8c86cba111a/examples

atagen avatar Mar 13 '23 05:03 atagen

oddly, the base rust library is wholly async, but the python bindings are sync. we appear to have had async some time in the past, where that text is leftover from. however, I can't find any of the async calls from those old examples grepping the code, so it seems likely those APIs are gone.

kind of a mood killer.

atagen avatar Mar 13 '23 05:03 atagen

Hi,

Since the major API reworks of Zenoh 0.6, we had decided to drop support for the async API in Python. There are a few reasons for this, which kind of amplify each other:

  • Maintaining both a synchronous and asynchronous API is a serious maintenance burden. Everything has to be done twice, sometimes in largely different fashions due to the very different execution models.
  • Of the core members of the Zenoh team, no one is confident in their understanding of asyncio's event loop, nor of how nicely/horribly it may play with Rust's polling futures.
  • To reduce the minimum overhead, Zenoh calls callbacks from its own thread pool. Python's GIL is already a big burden on this execution model, as GIL pre-emption doesn't seem to apply with pyo3. This turns long Python callbacks into a bottleneck for the whole application, and I'm kinda worried (but admittedly don't have proof) that coroutines would make things worse.

If you're working with asyncio, I'd suggest piping your samples through some sort of asynchronous channel, and having a task read the samples from that channel asynchronously. Doing so will have the added bonus of guaranteeing that your callbacks stay short, which will let Zenoh perform to its best.

p-avital avatar Mar 13 '23 08:03 p-avital

As a side note, I'm retagging this issue:

  • Remove bug: the actual bug was in documentation, by still mentioning asyncio which we have intentionally dropped support for.
  • Add enhancement: Provided we find sufficient bandwidth within the team to properly study it, we'll probably come back to how to support asyncio to a standard that warrants the additional maintenance burden (since our justifications for removing it was "we're doing much more work for something we're not convinced we're doing properly").
  • Add wontfix: The team is busy with many new features for the whole Zenoh ecosystem, so I wouldn't expect that asyncio support to come soon.

p-avital avatar Mar 13 '23 08:03 p-avital

Regarding the dangers of a Python script failing to cleanup a Zenoh subscriber...

I sometimes write code so bad that it cannot be exited normally, and the process must be killed. What should I do to clean up after such an event?

ipsod avatar Mar 13 '23 22:03 ipsod

This would be better suited for issue #62, but I'll reply in context: should your process exit abnormally, the link(s) to neighbour(s) will eventually time out (10s by default), at which point all declarations made by the dead session will be undeclared.

Note that this will only happen once the session has become unable to contact its neighbours. If you may declarations in a process that does keep its session alive and well, the infrastructure will assume the session still needs all of the declarations that haven't been undeclared.

p-avital avatar Mar 13 '23 22:03 p-avital

Bringing some relevant discussion points from #187 that @Mallets and I had.

I pointed out that this issue was previously tagged as wontfix, which it seems was a mistake.

@Mallets also mentioned:

"async is still in the project radar for being reintroduced but it needs to be reintroduced properly. One doesn't simply reintroduce async (or walk into Mordor)... Some preparatory work needs to happen before reintroducing it. As said before, having an async keyword in the API doesn't necessarily means the API is async. But we will get there... :)"

keenanjohnson avatar May 07 '24 18:05 keenanjohnson