tiled icon indicating copy to clipboard operation
tiled copied to clipboard

Use a thread pool in `download()`.

Open danielballan opened this issue 4 years ago • 2 comments

Ideally the download() function and methods would saturate the network. Most of the work involves reading data from sockets and writing to (disk) cache, operations which do not hold the GIL. Therefore, this is a good task for threads.

We may also want to pay attention to the size of the HTTP connection pool: https://www.python-httpx.org/advanced/#pool-limit-configuration

danielballan avatar Oct 23 '21 12:10 danielballan

Often, I find latency is a bigger overhead, especially for small chunks and/or distant network. async batching (which can massively oversubscribe versus threads) is much better at amortising latency. Is httpx's connection pool thread-safe? Requests' is not.

martindurant avatar Oct 26 '21 15:10 martindurant

Oh, that's useful. I have also observed that latency is the bigger issue (and improving the client-side cache has helped with that). I didn't know that async batching was better at amortizing latency.

danielballan avatar Nov 03 '21 12:11 danielballan