HelmholtzMedia icon indicating copy to clipboard operation
HelmholtzMedia copied to clipboard

go parallel

Open thorade opened this issue 12 years ago • 4 comments

I don't see how to do this in Modelica, but going parallel could significantly speed up things: All 5-14 HelmholtzDerivs can be calculated simultaneously, see HelmholtzDerivs and setHelmholtzDerivs and all 12 up to 50 terms of each HelmholtzDeriv could be evaluated simultaneously, see e.g. f_r

Combining these two should give 60 or more independent threads. This should be investigated in combination with (automatic?) common subexpression elimination, because the f_r etc function calls have very many terms in common! Maybe combine with #26 ?

thorade avatar Jan 27 '13 17:01 thorade

There are many other places where two or more functions could be called in parallel:

  • more sum() operators: https://github.com/thorade/HelmholtzMedia/blob/7b867a59b0556b443b2ad463bf3d21d9e1d5537d/HelmholtzMedia/Interfaces/PartialHelmholtzMedium/Ancillary/saturationPressure_T.mo#L21
  • setSat functions: https://github.com/thorade/HelmholtzMedia/blob/7b867a59b0556b443b2ad463bf3d21d9e1d5537d/HelmholtzMedia/Interfaces/PartialHelmholtzMedium/setSat_d.mo#L42-L43

thorade avatar Sep 09 '20 06:09 thorade

@mahge, do you think we can exploit such a fine-grained parallelism? I'm afraid the overhead could kill any potential speedup.

casella avatar Sep 09 '20 16:09 casella

I can not say much without looking at it further. However, the design and implementation is intended to be used for fine-grained parallelism, i.e., at equation level instead of just strongly connected components. Unfortunately, it will not go down into functions and parallelize things there yet.

The good news is that, if these large functions (computations) are attached (called from) equations that can be computed independent from each other in a single time step, it should be parallelizable. In other words, consider each instance of the call to these functions from different equations as part of that equation's computation. If, after causalization, one of the assignments does not use the LHS of the other equation, it is all the same for the implementation and we should be able to run them in parallel.

As for the sum() operators and similar data-parallel computations within functions/algorithms there is another parallelization implementation I did a while back that can handle them even on GPUs. However, this will require modifications to the library source code making it unusable on other Modelica tools. Plus the arrays/computations need to be quite large (by Modelica standards) to see any speedup. We can look at that afterwards if you are interested.

mahge avatar Sep 09 '20 19:09 mahge

I can not say much without looking at it further. However, the design and implementation is intended to be used for fine-grained parallelism, i.e., at equation level instead of just strongly connected components.

OK.

Unfortunately, it will not go down into functions and parallelize things there yet.

I guess this issue could be solved by clever generation of auxiliary variables. We already have some kind of Common Subexpression Elimination on functions carried out by wrapFunctionCalls, which generates auxiliary equations $cseNN = f(...); for each function call in the model, and use $cseNN in place of that in the functions inside equations. Maybe this could be good enough to get separate function calls in parallel.

The good news is that, if these large functions (computations) are attached (called from) equations that can be computed independent from each other in a single time step, it should be parallelizable.

Yes, that is the point.

In other words, consider each instance of the call to these functions from different equations as part of that equation's computation. If, after causalization, one of the assignments does not use the LHS of the other equation, it is all the same for the implementation and we should be able to run them in parallel.

As for the sum() operators and similar data-parallel computations within functions/algorithms there is another parallelization implementation I did a while back that can handle them even on GPUs. However, this will require modifications to the library source code making it unusable on other Modelica tools. Plus the arrays/computations need to be quite large (by Modelica standards) to see any speedup. We can look at that afterwards if you are interested.

Yeah, I guess the size of arrays is not so large that we can benefit from that. After all, a double-precision summation is one clock cycle on modern CPUs (or even less for superscalar architectures), so if you need to sum a few dozen numbers going paralle probably doesn't make sense.

casella avatar Sep 09 '20 20:09 casella