Improve concurrency documentation
Using sharp to compress a large number of images, it kept only using around 550% CPU, despite me having 14 cores available. I read https://sharp.pixelplumbing.com/api-utility/#concurrency, but this seemed to indicate it should be working properly out of the box, as sharp.concurrency() was correctly returning 14.
After digging through some other issues in this repo I eventually figured out that I could make it use my full CPU by setting process.env.UV_THREADPOOL_SIZE = 14. I suppose this is because there's a lot of overhead when using multiple threads on one image, so it's more efficient to process 14 separate images each with a single thread.
I think the documentation is misleading here; it implies that sharp is very efficient at using multiple threads, so when processing lots of images using 14 threads per image would be about as efficient as using 1 thread per image. But in reality that appears to not be the case, the latter is much faster. So I recommend that the documentation make the user aware of this, and tell them that if they need to process multiple images efficiently, they need to manually increase process.env.UV_THREADPOOL_SIZE.
Did you see the following sentence?
https://sharp.pixelplumbing.com/api-utility/#concurrency
The maximum number of images that sharp can process in parallel is controlled by libuv’s
UV_THREADPOOL_SIZEenvironment variable, which defaults to 4.
Perhaps we need to clarify what concurrency and parallelism mean in this context, and/or move this text to the performance section rather than under API?
I did see that; there was certainly some lack of insight on my part, but I also think the documentation could be clearer. I assumed that Sharp was generally "doing smart things" with my CPUs and it would be counterproductive for me to fiddle with it. But in reality it appears that the defaults are just really bad on any machine with more than 4 cores, and I think it would be good to mention that explicitly. (Or change the defaults, but I expect there was some good reason they were set to that in the first place.)
The thread pool size isn't really in sharp's control but up to libuv. I do wonder why it's so low - are these threads more expensive than usual or something? Threads are supposed to be fairly cheap as long as they can be suspended.
Yes, the default size of the libuv (and therefore Node.js) threadpool is the limiting factor here. There have been a few attempts over the years to adopt a more dynamic approach to sizing, with the latest being https://github.com/libuv/libuv/pull/4415. If that lands then Node.js itself could be modified to dynamically size its threadpool to the core count, respecting cgroups etc.
Thanks, that PR looks very relevant. Immich's approach to this is to run a script before startup to set this environmental variable to the number of CPU cores (respecting cgroups).
Related, there's a section in the docs:
For example, by default, a machine with 8 CPU cores will process 4 images in parallel and use up to 8 threads per image, so there will be up to 32 concurrent threads.
I personally only see high utilization when scaling the number of images processed at once, and it seems to scale at about a core per concurrent image. Are the 8 threads mentioned here only in certain situations?
Commit https://github.com/lovell/sharp/commit/e688c536591df712001933be1ca4d5c685199f87 should make the docs clearer. These will be re-published along with the next release.
I personally only see high utilization when scaling the number of images processed at once, and it seems to scale at about a core per concurrent image. Are the 8 threads mentioned here only in certain situations?
It's format (input and output) dependent; I've removed the "up to 32 concurrent threads" bit to avoid confusion.
The updated docs have now been published as part of v0.34.3 - please see https://sharp.pixelplumbing.com/performance/