trimeter Which functions should we expose, and what should they be called?

In my initial draft, I have 3 functions:

run_on_each: concurrent map, with results optionally directed to a SendChannel, no return value
amap: concurrent map, async with calling convention, with results provided as an async iterable
run_all: concurrent call-all-these-callables, with results provided at the end as a list

I'm not at all sure that these three are the right set to provide, or that we have the names right.

I guess there's a two-dimensional space of calling conventions:

Input handling: fn+iterable (map style) vs. iterable-of-fns (gather-style)
Output handling: discard vs. send-on-channel vs. async-with-returning-async-iterable vs. nursery.start-returning-async-iterable vs. big-list-at-end

So in principle there are 2*5 = 10 functions we could provide here... but that's way too many and too confusing, so we need to cut it down somehow.

Oct 07 '18 15:10 njsmith

Hmm.

Discard vs send-on-channel can be the same function by taking an optional channel argument (as run_on_each already does).
run_on_each could also take optional task_status; if used with start(), it would internally create a channel pair and make start() return the receive end.
For fn+iterable versus iterable-of-fns, I think the split you have here makes sense. In practice, people are going to use this with either "a handful of things" or "an unknown large number of things". The "handful" case is much more likely to be running different functions and to not care about getting the results incrementally. It's easy enough to adapt between the two conventions: run_all([partial(fn, arg) for arg in args]) or run_on_each(lambda fn: fn(), fns).

(fun fact: for maximum inscrutability points, that run_all could also be run_all(map(partial(partial, fn), args)))

So I think the three we have now are a good three to be working with. Friendly amendment: maybe run_all can support being invoked with positional *args, so you can say run_all(thunk, thunk, thunk) instead of needing another pair of delimiters. I think the case where all the functions are listed in the code will be pretty common. Could also support both conventions, under the theory that callable iterables are rare.

Naming:

run_all is my favorite name of the three, it's very clear and Trio-ish, and it seems obvious to me that it takes a bunch of thunks and returns a list. asyncio calls this gather but I don't think that's as good of a name.
run_on_each also seems clear. I don't love that it parallels run_all but has a different result convention, but I can live with that.
amap is a little inscrutable compared to the other two, and I look at it and expect something that returns a list (probably due to too much exposure to builtin map in 2.x returning a list). We could borrow asyncio's terminology and call it as_completed, maybe? Or running_on_each or some other gerund-y form to emphasize the context manager aspect.

Sep 03 '20 08:09 oremanj

I'll put in a plug for an async as_completed function: https://github.com/groove-x/trio-util/issues/7

IIUC what run_all does but as an async iterator. I don't want to have to wait until all the functions have completed before I can pass the results off to the next step in the pipeline.

Sep 03 '20 09:09 dhirschfeld

trimeter trimeter copied to clipboard

Which functions should we expose, and what should they be called?

trimeter
trimeter copied to clipboard