Evolutionary.jl icon indicating copy to clipboard operation
Evolutionary.jl copied to clipboard

Parallelization

Open wildart opened this issue 5 years ago • 11 comments

Consider parallelization of the algorithms in multiple modes:

  • [x] Single process (core)
  • [x] Multi-threading
  • [ ] Multi-process (multi-core)

wildart avatar Mar 18 '20 19:03 wildart

Despite the major changes I tried to pull in #43 , parallelization of the entire population is not that hard to implement, using the DistributedArrays package. I already have a prototype that works well with several processes in the same computer and I almost have a way to easily incorporate this in a cluster. If I have time I can add this prototype in another PR

tpdsantos avatar Mar 18 '20 19:03 tpdsantos

I do not think that DistributedArrays is an answer. It covers only multi-core scenario. Even basic julia parallel computing routines are enough for scatter-gather computations that are required for evolutionary algorithms.

My goal is to make some sort of universal parallelization pipeline that would be configured to a specific computational topology, and the used to run any evolutionary algorithm. In order to do that, all parallelizable parts of the evolutionary algorithms need to be self-contained side-effect free functions, similar to #26 or ga part of #43.

The computational pipeline should have some simple interface and comprehensive syntax,

input |> ga(fitness = objFunc, mutation = inversion) |> Distributed(ncores = 10) 

or maybe even some simple DSL,

@local ga(input, mutationRate = 0.2, tolIter = 20) do
    population |> roulette |> inversion |> offspring
end

wildart avatar Mar 18 '20 20:03 wildart

I do not think that DistributedArrays is an answer. It covers only multi-core scenario. Even basic julia parallel computing routines are enough for scatter-gather computations that are required for evolutionary algorithms.

I understand what you're saying, but the Distributed package also deals with multi-core only. You could use something like Base.Threads, but that wouldn't be easy at all, since the ga function would need major changes.

tpdsantos avatar Mar 18 '20 20:03 tpdsantos

but that wouldn't be easy at all, since the ga function would need major changes.

Which you already doing in #43 :wink:

Anyway, I think the right approach would be to start compartmentalizing code of evolutionary functions.

wildart avatar Mar 18 '20 21:03 wildart

The #49 should provide a easier way of implementing parallelized versions of existing algorithms by introducing a series of new states with appropriate parallel update_state! implementations.

wildart avatar Apr 23 '20 23:04 wildart

An easy approach to parallelisation which would be useful, is to ask the user (caller) to calculate the fitness for many individuals at once. Then the user can simply parallelise that call themselves (using whatever means is appropriate for their machine) and minimal changes are required in this package. i.e. what I am suggesting is simply that the user provides a function which must internally iterate over teh population. In a simple case I could already use this directly and parallelise through e.g. a call to pmap.

jtravs avatar Nov 24 '20 22:11 jtravs

calculate the fitness for many individuals at once.

That might work. Currently, the fitness evaluation done by value call with the objective and the individual parameters. If a broadcast (in-place) version of it can be introduced to perform bulk evaluation, it can be overloaded for specific individual type to introduce concurrent broadcast version.

wildart avatar Dec 01 '20 01:12 wildart

Hi, nice package!

Have there been any update on the parallelisation?

gasagna avatar Jul 07 '21 17:07 gasagna

I added a simple override for multi-threaded fitness evaluation: https://wildart.github.io/Evolutionary.jl/dev/tutorial/#Parallelization. Look up the dev part of documentation for information on creating additional overrides for parallel fitness evaluation: https://wildart.github.io/Evolutionary.jl/dev/dev/#Parallelization.

  • Note: This parallelization implementation work better if fitness function requires considerable computational resources.

wildart avatar Jul 16 '21 19:07 wildart

Hi!

I'm trying to implement parallelization following the first link, but when I download the package and check Evolutionary.Options(), parallelization is not listed. Actually, "rng" and "callback" are also missing. Do you know what might be the reason? Thanks!

nguyentmanh avatar May 14 '22 05:05 nguyentmanh

Just want to follow up and saw that allowing value functions to be able to use Distributed would be great addition.

I think the threading helps but doesn't allow you to evaluate multiple samples from the population simultaneously

mfogelson avatar Feb 23 '23 21:02 mfogelson