Add collection of worked examples to tutorial
We should refactor the tutorial into an initial part that's similar to what we have now, or the middle part of my talk, and then a collection of examples that also serve as excuses to go into more depth on particular topics.
I expect the list will grow over time, but here are some ideas. (Actually the main reason I'm filing this is to have a place collect these so I don't lose them.)
-
The current tracing demo should move here. It's a good intro to trio introspection and to co-op concurrency, but having it in the main tutorial like it is now is a big blob of text for folks to wade through if they already know this stuff. (We can/should link to it from the async/await intro though.)
-
Happy eyeballs (for people who saw the talk but want a text version; as a demo of passing nursery objects around; ...)
-
Multiplexing rpc (#467)
-
Catch-all exception handler
-
Custom nursery, like a race function or ignore-errors nursery (maybe both)
-
Some standard stuff like echo server, proxy, fast web spider, ... Whatever doesn't end up in the main tutorial. (We could have echo server and TCP proxy as two examples, then show how to run them both within a single process as an example of implementing multi-protocol servers... and also to show off how
proxy_one_waycan be re-used for both! Maybe the proxy should demonstrate mirroringlocalhost:12345tohttpbin:80, so people can try it out with their web browsers?) -
trio-asyncio example?
-
nursery.start
Possibly some of these could be combined or form sequences, eg echo server -> catch all handler -> nursery.start
An asyncio.gather-like (collect the results of all tasks and return the results) would be a good example (as you've said before on Gitter iirc)
Oh yeah, good idea! (And we should use the terms asyncio.gather and Promise.all in the headline, because people seem to be looking for those.)
Oh, see also #421, which is a partial duplicate and has some more discussion of the nursery-based examples.
Oh duh, here's another one: an example of implementing a custom protocol, by combining a sansio protocol with the stream interface. (Probably some simple line-oriented or netstring-oriented thing. This is one of the motivations for why I started working on sansio_toolbelt. I should make it actually usable...)
A walkthrough for converting a sync protocol to Trio might also make sense.
WRT trio-asyncio: maybe simply refer to it. I do need to add some example that shows how to convert from asyncio to trio-asyncio to trio, and how that improves the code. ;-)
I do need to add some example that shows how to convert from asyncio to trio-asyncio to trio, and how that improves the code. ;-)
Would love to see that. 😁
Something like @jab's HTTP CONNECT proxy from https://github.com/python-trio/trio/pull/489#issuecomment-379455747 might be interesting too.
(Possibly rewritten to use h11 ;-).)
@oremanj's as_completed here might be interesting as the basis for a gather equivalent: https://gitter.im/python-trio/general?at=5ad186345d7286b43a29af53
Maybe examples of testing and debugging would be good too. (Testing might just refer to the pytest-trio docs. Which we still need to write...)
There was some more discussion of this on Gitter today, which resulted in rough drafts of reasonable implementations for gather and as_completed both: https://gitter.im/python-trio/general?at=5ae22ef11130fe3d361e4e25
@N-Coder pointed out that there are a number of useful "asyncio cookbook" type articles floating around, and probably trio would benefit from something that serves a similar role. I think that's the same idea in this thread, but the examples are potentially helpful:
- https://medium.com/python-pandemonium/asyncio-coroutine-patterns-beyond-await-a6121486656f
- https://hackernoon.com/asyncio-for-the-working-python-developer-5c468e6e2e8e
#527 is a common question; we should have something for it.
As mentioned in #537, a UDP example would be good. This would also be a good way to demonstrate using trio.socket directly.
The example in that comment thread is kind of boring. Doing a dns query or ntp query would be more interesting. [edit: see notes-to-self/ntp-example.py for an ntp query example.]
It would be good to have an example that discusses the subtleties of aclose: in general, cancellation means "stop what you're doing ASAP and clean up". But if what you're doing is cleaning up... what does that mean? Basically just "clean up ASAP" → graceful vs forceful close.
Maybe this would fit in with an example that wraps a stream in a higher-level protocol object, so we have to write our own aclose? Esp. if the protocol has some kind of cleanup step.
Interaction between __del__ and trio is another thing we should discuss somewhere. (It's very tricky. The obvious problem is that __del__ isn't an async method, so it can't await anything. But it's actually much worse than that. __del__ methods are called at arbitrary moments, so they have all the same complexity as signal handlers. Basically the only operation that's guaranteed to be usable from __del__ is TrioToken.run_sync_soon. At least this is better than asyncio, where AFAICT literally no methods are guaranteed to be usable from __del__, but it's still extremely tricky.)
Channel examples – we might move the ones that are currently in reference-core.rst here.
It would also be good to have an example of tee, if only to have something to point to when explaining that it's not what ReceiveChannel.clone does. The simplest version is something like:
async def send_all(value, send_channels):
async with trio.open_nursery() as nursery:
for send_channel in send_channels:
nursery.start_soon(send_channel.send, value)
But then there are complications to consider around cancellation, and error-handling, and back-pressure...
Using a buffered memory channel to implement a fixed-size database connection pool
Re previous message: @ziirish wrote a first draft: https://gist.github.com/ziirish/ab022e440a31a35e8847a1f4c1a3af1d
Zero-downtime upgrade (via socket activation, socket passing, unix-domain socket + atomic rename?)
example of how to "hide" a nursery inside a context manager, using @asynccontextmanager (cf https://github.com/python-trio/trio/issues/882#issuecomment-457962244)
[Edit: And also, what to do in case you need to support async with some_object: ...]
Maybe some information about how to do Tasks in trio, like here: https://github.com/python-trio/trio/issues/892#issuecomment-459195578
[Note: I think this means asyncio.Task-equivalents -njs]
As requested by @thedrow (e.g. #931), it would be great to have a simple worked example of wrapping a callback/fd-based C library and adapting it to Trio style, demonstrating wait_readable/wait_writable. We'll want to take special care to talk about cancellation too, because that's important and easy for newcomers to forget about.
I'm not sure what the best way to do this would be. Callback-based C libraries tend to be complicated and have idiosyncratic APIs. Which to some extent is useful for an example, because we want to show people how to handle their own complicated and idiosyncratic API, but it can also be problematic, because we don't want to force people to go spend a bunch of time learning about details of some random library they don't care about.
We could write our own toy library just for the example, in C or Rust or whatever.
We could pick an existing library that we think would be good pedagogically. Which one? Ideally: fairly straightforward interface, accomplishes a familiar task, already has a thin Python wrapper or it's trivial to make one through cffi. Some possibilities:
- libpq (wrapper: psycopg2, async support)
- curl (wrapper: pycurl, this uses multi support, curl docs)
- c-ares? pretty simple but integrating its cancellation support with trio would be a mess (you can cancel all operations using a socket, but can't cancel one operation using a socket)
- hiredis? @thedrow mentioned it as a library they were interested in. There's a wrapper called
hiredis-py, but from the README it sounds like it's a pure parsing/sans-io library, and doesn't do I/O at all, so wrapping it in trio would look exactly like wrapping it with any other I/O system? The underlyinghiredislibrary has sync, async, and sans-io APIs, so I guess hiredis-py just wraps the sans-io API. I suppose we could demonstrate using cffi to wrap the async API? - Does anyone else have a favorite?
@thedrow we already have an issue for sd_notify and friends – let's discuss that over on #252. This thread is about examples to teach people about trio, and AFAIK there isn't anything pedagogically interesting about sd_notify – it's pretty trivial to implement in pure python, or if you have an existing C/Rust implementation you like then it's trivial to bind in python and the binding won't care what io library you're using.
Some examples of commonly-desired "custom supervisors" would be useful, e.g. the dual-nurseries trick in #569.
As mentioned here in gitter earlier today, when I was working the Semaphore primitive I felt unsure of how I was using it. I'd appreciate some examples and documentation on the common use cases of the synchronization primitives shipped with trio and will try to help with this. There is existing documentation here that should be considered too.
Here's a sketch for a web spider, including a tricky solution to figuring out when a circular channel flow is finished: https://gist.github.com/njsmith/432663a79266ece1ec9461df0062098d
Hey @njsmith I just tested your spider, and it seems there is an issue regarding the closing of the send channel clones. Here is a mwe with a proposed fix:
import trio
import random
from collections import deque
WORKER_COUNT = 10
tasks = deque(i for i in range(103))
results = []
async def worker(worker_id, tasks, results, receive_chan):
async def process(task):
await trio.sleep(random.uniform(0, 0.1))
return task
async for send_chan, task in receive_chan:
async with send_chan:
result = await process(task)
results.append((result, worker_id))
if tasks:
await send_chan.send((send_chan.clone(), tasks.popleft()))
continue
print('Worker {} reached an empty queue.'.format(worker_id))
break
async def batch_job(tasks, results):
send_chan, receive_chan = trio.open_memory_channel(float("inf"))
for _ in range(WORKER_COUNT):
await send_chan.send((send_chan.clone(), tasks.popleft()))
async with trio.open_nursery() as nursery:
for worker_id in range(WORKER_COUNT):
nursery.start_soon(worker, worker_id, tasks, results, receive_chan)
trio.run(batch_job, tasks, results)
If I remove the break in the async for send_chan, task in receive_chan loop the program hangs. Could you explain why exactly this fix works? Is there a more correct way to fix the issue?
You're not closing the original send_chan.
Also you fill your queue first and start the workers afterwards, which in a real program (i.e. with a non-infinite queue) isn't a terribly good idea.
Anyway, why is your code so complicated?
Simplified:
tasks = deque(range(103))
results = []
async def worker(worker_id, results, receive_chan):
async def process(task):
await trio.sleep(random.uniform(0, 0.1))
return task
async for task in receive_chan:
result = await process(task)
results.append((result, worker_id))
print('Worker {} reached an empty queue.'.format(worker_id))
async def batch_job(tasks, results):
send_chan, receive_chan = trio.open_memory_channel(WORKER_COUNT)
async with trio.open_nursery() as nursery:
async with send_chan:
for worker_id in range(WORKER_COUNT):
nursery.start_soon(worker, worker_id, results, receive_chan)
for t in tasks:
await send_chan.send(t)
await send_chan.aclose()
trio.run(batch_job, tasks, results)
Note that you don't need (and don't want) an infinite queue; limiting it to the number of workers is more than sufficient.
@smurfix Thanks for the explanation. Could you please elaborate on why it is a bad idea to first fill the queue and then start the workers? I'm building a concurrent scraper that should roughly speaking be given a batch of urls and then scrape them concurrently. Because of retries some links may get added back to the queue. Yeah the code is more complicated than it should be, but I figured it's simple enough to get the point.
OK, yeah, if you're scraping then your original code makes more sense. ;-)
The point is that you should never use an infinite queue. Infinite queues tend to fill memory. Also, the point of a queue is to supply back-pressure to the writers (i.e. your scraper) to slow down because the workers can't keep up. This, incidentally, significantly improves your chances of not getting blocked by the scraped sites.
OK, now you have 103 sites in your initial list, 10 workers, and a 20-or-however-many job queue. If you fill the queue first, the job doing that will stall, and since there's no worker running yet you get a deadlock.
Thank you for pointing out the issue with the infinite queue.
I don't understand how filling a regular deque can stall. The workers either add the response to the result or add the url back into the queue (of course there are various retry limits, timeouts etc...). You mentioned a "20-or-however-many job queue". I'm adding all my urls to the queue first. All other additions are done by the workers themselves in case of retries.
Regarding rate limiting I do have a mechanism based on timing, and I do not rely on any sort of queue for that if that's what you meant.
Edit: My use case involves a low number of thousands of links at most, so I can afford to keep the entire queue in memory ie. I don't need a separate queue for workers to feed on and another to fill the first one.