zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

[V3] Function signatures for the sync API

Open d-v-b opened this issue 10 months ago • 2 comments

This issue is for discussions of the v3 API used for calling async functions from synchronous functions. (Not to be confused with the API for synchronizing the state of storage across multiple workers). I couldn't find a previous issue for discussing the sync api in v3; happy to close this if one already exists.

In v3 we synchronize async code with the Sync mixin class

Sync has a .sync method, which just wraps the sync function defined in the same module.

The body of sync looks like this:

def sync(coro: Coroutine, loop: Optional[asyncio.AbstractEventLoop] = None):
    """
    Make loop run coroutine until it returns. Runs in other thread

    Examples
    --------
    >>> sync(async_function(), existing_loop)
    """
    if loop is None:
        # NB: if the loop is not running *yet*, it is OK to submit work
        # and we will wait for it
        loop = _get_loop()
    if loop is None or loop.is_closed():
        raise RuntimeError("Loop is not running")
    try:
        loop0 = asyncio.events.get_running_loop()
        if loop0 is loop:
            raise NotImplementedError("Calling sync() from within a running loop")
    except RuntimeError:
        pass
    result_box: List[Optional[Any]] = [None]
    event = threading.Event()
    asyncio.run_coroutine_threadsafe(_runner(event, coro, result_box), loop)
    while True:
        # this loops allows thread to get interrupted
        if event.wait(1):
            break

    return_result = result_box[0]
    if isinstance(return_result, BaseException):
        raise return_result
    else:
        return return_result

This function in turn relies on _runner, which among its arguments takes a list, which it will mutate. Notably, _runner returns None -- _runner does its work by mutating one of its arguments.

Is there a way to get the same behavior with a conventional function that returns a value? I.e., without something like _runner mutating one of its arguments. Because sync relies on _runner mutating the result_box variable, and because result_box is initialized to [None], it becomes impossible to distinguish "_runner ran a function that returned None" from "_runner never ran anything".

The type annotation we want for sync would be something like def sync(coro: Coroutine[Any, Any, T], loop: asyncio.AbstractEventLoop | None = None) -> T , but the use of result_box without some way to ensure that we never return its initial value of None forces us to instead have a return type of T | None, which isn't really what we want from sync -- it should only ever return the result of calling the coroutine, never a dummy value.

I got something to work that avoids the need for result_box, which I will open in a companion PR to this issue.

d-v-b avatar Apr 21 '24 14:04 d-v-b

To add some extra context. The sync approach we're using came from zarrita which came from fsspec which was written by @martindurant 🙌 . There is some black magic in there, no doubt -- but it does seem to work.

cc @dstansby who has been doing lots of work on the Zarr v3 type checkers

jhamman avatar Apr 21 '24 16:04 jhamman

to summarize some comversation from #1804, one approach would be to import all of this stuff from fsspec, and implement as little as possible within zarr-python. This would be expedient, but would introduce a bit of an odd dependency relationship between zarr-python and fsspec. Also, we would need to add type annotations to the functions in fsspec, but that's not a big blocker, and should probably happen anyway.

@martindurant suggested this idea

d-v-b avatar Apr 21 '24 18:04 d-v-b

I think https://github.com/zarr-developers/zarr-python/pull/1804 cleaned this up enough to close. @d-v-b, reopen if you disagree.

jhamman avatar May 17 '24 16:05 jhamman