picos icon indicating copy to clipboard operation
picos copied to clipboard

Poor scaling on random scheduler

Open Drup opened this issue 5 months ago • 3 comments

I run a benchmark recently for parallel map on trees. It makes for an interesting embarrassingly parallel (recursive) task. The core function is the map_par_full function below:

open Picos

type 'a tree = N of 'a * 'a tree * 'a tree | L

let async e =
  let p = Computation.create () in
  let fiber = Fiber.create ~forbid:false p in
  Fiber.spawn fiber (fun _ -> Computation.return p @@ e ());
  p

let rec map_par_full f t = async @@ fun () -> match t with
  | L -> L
  | N (v , tl , tr ) ->
    let task_tl = map_par_full f tl in
    let task_tr = map_par_full f tr in
    N (f v,
       Computation.await task_tl,
       Computation.await task_tr)

I run this with Picos' and Moonpool's schedulers. Here are the results for a tree of size 10000 with medium-sized tasks on a Intel Xeon E5-2630 (so, 12 cores * 2 hyperthreading). Time is normalized to a non-async baseline.

Image

The two immediate conclusions are that

  1. Multithreading is crap on this workload (welp)
  2. The scaling for Picos' multififo is surprisingly good!
  3. The scaling for Picos' random scheduler seems ... buggy ? There's definitely some sort of issue there.

In general, given the embarrassingly parallel nature of the workload, I expected random to be much better than multififo, and this is not the case at all.

cc https://github.com/c-cube/moonpool/issues/38 on the moonpool's bug tracker.

Drup avatar Jul 23 '25 13:07 Drup

It is great to see people experiment with these!

The multififo scheduler was relatively recently improved for performance in a couple of important ways (making spawning less expensive and balancing load) and it uses one of the fastest MPMC queues for OCaml. For CPU bound parallel work one really wants an unfair mostly LIFO scheduler (i.e. work-stealing deque) and a scheduler such as the multififo, which aims to be fair, will not be quite as performant for purely CPU bound parallelism.

The randomized scheduler is intended for testing and has not received that much attention to performance. I'm sure it could be improved, but it really is not intended for things where performance matters.

polytypic avatar Jul 23 '25 19:07 polytypic

Then, the work on the multififo scheduler definite paid off. :) I agree though, this kind of workload is really made to run on workstealing!

I would suggest you improve the documentation a bit, by emphasising a little bit more which scheduler you consider "good to use", and for which use cases. It definitely wasn't clear to me that the random scheduler was not supposed to be efficient.

Also, I think it would be nice to indicate explicitly (at the beginning of the readme and in the docs, maybe?) other picos-compatible libraries currently usable.

Drup avatar Jul 23 '25 19:07 Drup

PRs to add links to other Picos-compatible libraries would be welcome!

Regarding "good to use", yes, there are some notes in that regard, but it could be improved. I've actually been trying to avoid "selling" the sample libraries in Picos too much, because the whole point is interoperability rather than the specific sample libraries, but I have also put some effort into many of them to make sure things can be given nice APIs and be implemented efficiently.

polytypic avatar Jul 24 '25 06:07 polytypic