image icon indicating copy to clipboard operation
image copied to clipboard

Better parallelism controls

Open Shnatsel opened this issue 1 year ago • 10 comments

image only provides the rayon compile-time feature for controlling parallelism. There are no runtime controls exposed, which means there isn't an obvious way to control things like:

  1. Whether format encoders/decoders will use multi-threading (OpenEXR, AVIF). Multi-threading in batch workloads that saturate all cores anyway would only cause slowdowns.
  2. Whether certain image operations will use parallelism internally. For example, resize #2223 and blur #986 could benefit, but unconditionally enabling parallelism would degrade performance for small images so we aren't using parallelism right now.

Shnatsel avatar Oct 21 '24 21:10 Shnatsel

  1. This is particularly tricky because methods like image::open or DynamicImage::save don't provide knobs to precisely control behavior, but I'm not thrilled about just defaulting to single-threaded implementations unless users go out of their way to request multithreading. For codecs, maybe the right thing to do is to add overrides on ImageEncoder / ImageDecoder to force single-threading for folks who need it, and otherwise base on whether the rayon feature is enabled?
  2. I think the decision of whether an image is too small to benefit from parallelism should be within the library. Afterall, we're probably the best suited to measure what the fixed overheads are.

fintelia avatar Oct 21 '24 21:10 fintelia

I'd also prefer to delegate decision to library how many threads to use, and if its even worth to the library, with simple flag if its enabled or no, with multi-threading enabled by default.

awxkee avatar Oct 21 '24 23:10 awxkee

I'd like to be able to limit max number of cores. I've got servers with 128 cores, and each library creating a threadpool for the entire core count makes everything an overkill.

kornelski avatar Oct 29 '24 12:10 kornelski

For deployment-time rather than coding-time thread count customization, I believe the RAYON_NUM_THREADS environment variable documented here would be a good option.

Shnatsel avatar Oct 29 '24 13:10 Shnatsel

Also, I'd expect we'd use the rayon global thread pool rather than creating our own. That seems like it would mitigate the issue by sharing the same threads with any other library that used rayon

fintelia avatar Oct 29 '24 17:10 fintelia

But I'd like to use max 4 cores for each image being decoded (there are diminishing results from splitting decoding further, and I don't want one job to monopolize the thread pool), but not have the whole 128-core server limited to 4 cores.

kornelski avatar Oct 31 '24 14:10 kornelski

Wait, doesn't rayon default to using the current thread pool? So if you call image::open from within a 4 thread pool, any parallel operations would be limited to those thread unless we went out of our way to use a different pool?

You'd have to keep track of a bunch of different thread pools, but that seems unavoidable if you're trying to subdivide 128 cores

fintelia avatar Oct 31 '24 19:10 fintelia

I've been thinking about this now that we can make semver-breaking changes again.

I don't think we will be able to get around letting people pass a custom Rayon thread pool. We will have to do it eventually to enable some use cases, so might as well do it now. I agree defaulting to parallelism for decoding/encoding is a good idea. I want that to be frictionless, so we can add optional methods to ImageDecoder and ImageEncoder traits to control that, just like we added metadata. Non-Rayon implementations can still reuse the settings from the Rayon thread pool (e.g. number of threads), see #2664 for a motivating use case.

The API for image operations is a little trickier. We definitely want parallelism for things like resize, blur, etc. Even run of the mill operations like contrast would benefit for larger images. We could make versions of existing functions prefixed with par_ which is the Rayon convention, see par_iter() and par_chunks() in Rayon. They would accept a Rayon thread pool as one of the arguments. I'm not in love with this idea but it seems to be simple and stupid and easy enough to use. The alternative would be making all those ops generic on a struct that indicates whether parallelism is enabled or not, contains the Rayon threadpool, and you'd construct it once and then reuse across image operations. This is less easy to use but wouldn't have two versions of a function for the same operation.

Thoughts?

Shnatsel avatar Nov 30 '25 01:11 Shnatsel

Do we need to take the thread pool as an argument? I'm not super familiar with the low-level details of rayon, but it looks like they expose a method to query whether the current function is running within a thread pool (and what thread index if so), and another method to query the number of threads in the current thread pool (or the number of threads in the global thread pool if not called from within one).

And from what I understand, any of rayon's parallel iterators and such implicitly use the current thread pool if there is one.

fintelia avatar Nov 30 '25 02:11 fintelia

That's also been my thought. We only need a ThreadPool argument if the work should not happen transparently in-scope or if we would internally create a new thread that we want to be aware of the target pool regardless.

In /previous job/ a similar situation with an implicit thread-pool showed good benchmark tests but atrocious production numbers. The cause was that the tasks spawned from one client-connection could flood the pool and delay the task responsible for spawning the fleet of subtasks of another client-connection. But since there was quite a bit of work to be done before any task was subdivided this mainly drove latency numbers up and had rather poor utilization overall. Almost like the system starved itself for periods at a time. Ultimately, using a pool task for scheduling others—without any idea if all tasks are roughly similar—is suspect to me since then.

197g avatar Nov 30 '25 03:11 197g