zarr-python
zarr-python copied to clipboard
[V3] Function signatures for the sync API
This issue is for discussions of the v3 API used for calling async functions from synchronous functions. (Not to be confused with the API for synchronizing the state of storage across multiple workers). I couldn't find a previous issue for discussing the sync api in v3; happy to close this if one already exists.
In v3 we synchronize async code with the Sync mixin class
Sync
has a .sync
method, which just wraps the sync
function defined in the same module.
The body of sync
looks like this:
def sync(coro: Coroutine, loop: Optional[asyncio.AbstractEventLoop] = None):
"""
Make loop run coroutine until it returns. Runs in other thread
Examples
--------
>>> sync(async_function(), existing_loop)
"""
if loop is None:
# NB: if the loop is not running *yet*, it is OK to submit work
# and we will wait for it
loop = _get_loop()
if loop is None or loop.is_closed():
raise RuntimeError("Loop is not running")
try:
loop0 = asyncio.events.get_running_loop()
if loop0 is loop:
raise NotImplementedError("Calling sync() from within a running loop")
except RuntimeError:
pass
result_box: List[Optional[Any]] = [None]
event = threading.Event()
asyncio.run_coroutine_threadsafe(_runner(event, coro, result_box), loop)
while True:
# this loops allows thread to get interrupted
if event.wait(1):
break
return_result = result_box[0]
if isinstance(return_result, BaseException):
raise return_result
else:
return return_result
This function in turn relies on _runner
, which among its arguments takes a list, which it will mutate. Notably, _runner
returns None
-- _runner
does its work by mutating one of its arguments.
Is there a way to get the same behavior with a conventional function that returns a value? I.e., without something like _runner
mutating one of its arguments. Because sync
relies on _runner
mutating the result_box
variable, and because result_box
is initialized to [None]
, it becomes impossible to distinguish "_runner ran a function that returned None
" from "_runner never ran anything".
The type annotation we want for sync
would be something like def sync(coro: Coroutine[Any, Any, T], loop: asyncio.AbstractEventLoop | None = None) -> T
, but the use of result_box
without some way to ensure that we never return its initial value of None
forces us to instead have a return type of T | None
, which isn't really what we want from sync
-- it should only ever return the result of calling the coroutine, never a dummy value.
I got something to work that avoids the need for result_box
, which I will open in a companion PR to this issue.
To add some extra context. The sync approach we're using came from zarrita which came from fsspec which was written by @martindurant 🙌 . There is some black magic in there, no doubt -- but it does seem to work.
cc @dstansby who has been doing lots of work on the Zarr v3 type checkers
to summarize some comversation from #1804, one approach would be to import all of this stuff from fsspec, and implement as little as possible within zarr-python
. This would be expedient, but would introduce a bit of an odd dependency relationship between zarr-python
and fsspec. Also, we would need to add type annotations to the functions in fsspec, but that's not a big blocker, and should probably happen anyway.
@martindurant suggested this idea
I think https://github.com/zarr-developers/zarr-python/pull/1804 cleaned this up enough to close. @d-v-b, reopen if you disagree.