symreg icon indicating copy to clipboard operation
symreg copied to clipboard

Multiprocessing is slow

Open danuker opened this issue 3 years ago • 5 comments

Multiprocessing is very slow, compared to evolving one generation.

One 50-individual generation evolves in around 10ms, whereas starting a new process is 100-200 ms.

Still, in benchmarks, small generations with frequent feedback are what work best.

There is no obvious way to parallelize, because the GIL would block the CPU-intensive threads.

Think about what can be done.

danuker avatar Dec 04 '20 18:12 danuker

Posted on Reddit. ~~TODO: Try out multithreading using Numba or Cython.~~ Tried Numba; needs too many changes. I can't use lambdas for instance.

danuker avatar Dec 16 '20 18:12 danuker

Why not have 5000-individual-sized population? That would be easy to generate locally on each process, wouldn't it?

The reason I'm hesitant about that is if you only perform selection after evaluating 5000 individuals, you spend lots of CPU creating irrelevant individuals, instead of deriving from the ones proven to be fit. I must design a benchmark that captures this effect, and compare with PySR.

danuker avatar Dec 27 '20 20:12 danuker

Consider using Twisted (and Deferreds):

Instead of forking every generation (taking lots of time to create new processes), create a pool of long-running processes only once at the start, and only move to them solutions to evaluate.

To evaluate performance, use 50 individuals per generation:

  1. Solutions per second on one CPU without a server in between (as is currently)
  2. Solutions per second on one CPU with a server in between (serialization, requests, and deserialization).
    • Profile and minimize sending of resources
    • Evaluate if it's worth continuing
  3. Solutions per second on multiple CPUs with a server
    • Estimate and maximize p and speedup in the context of Amdahl's law

Research:

  • https://www.artima.com/weblogs/viewpost.jsp?thread=230001
  • https://docs.twistedmatrix.com/en/twisted-21.2.0/core/howto/defer.html#deferredlist
  • https://docs.twistedmatrix.com/en/twisted-21.2.0/core/howto/process.html
  • t.i.t.deferToThread
  • https://www.techempower.com/benchmarks/#section=data-r13&hw=ph&test=json&l=zijzen-sf&w=zijocf-zik0vz-2hwcf
  • https:///@shmulikamar/python-serialization-benchmarks-8e5bb700530b - MsgPack?
  • Dask (credit: https://www.youtube.com/watch?v=zQeYx87mfyw)

danuker avatar Mar 30 '21 16:03 danuker

Another suggestion: use Ray. I haven't used it myself yet, but I keep hearing about it more and more often.

Otherwise, I believe the way to go is to start several processes in the beginning and send them work afterwards.

rolisz avatar May 07 '21 08:05 rolisz

Thanks for visiting my humble project, and for the cool suggestion! I believe my next move will be to compare Dask and Ray.

If that doesn't work, I am thinking of better-architected multiprocessing: instead of starting processes each generation, I would start a pool of N long-running servers, and:

  • use a pre-allocated large-enough ShareableList of the individuals to evaluate
    • I'd have to allocate a list of large initial individuals, to reserve enough memory, and watch out for ValueErrors:
      • ValueError: exceeds available storage for existing str
  • use a pre-allocated large-ehough ShareableList for the output, where each child computes its own index.

danuker avatar May 07 '21 08:05 danuker