ginkgo icon indicating copy to clipboard operation
ginkgo copied to clipboard

Generic executor

Open tcojean opened this issue 4 years ago • 3 comments

Adds a Generic Executor to dynamically select any concrete Executor

The behavior of the GenericExecutor can be controlled through three parameters, which can be set either to the constructor or as environment variable. The controls are (environment variable version):

  • GINKGO_GENERIC_EXEC_TYPE : a specific executor type to target. One of "cuda", "hip", "dpcpp", "omp", "reference", or the default "all".
  • GINKGO_GENERIC_EXEC_ID : a specific device ID to target. The default -1 allows to consider any ID.
  • GINKGO_GENERIC_EXEC_AUTO : can be set to 0 or 1, controls whether subsequent calls should provide the next executor in the list or the default behavior of providing the same one.

In detail, this:

  • Fix ginkgo-overhead example to use 1.0 instead of NaN (unrelated)
  • Add a new executor base, FakeExecutorBase to simplify the implementation of these kind of executors.
  • Add the GenericExecutor with its three control parameters. The create function manages the environment variables, whereas the constructor only uses flags.
  • The only part which is a bit more involved in the GenericExecutor is the the all behavior, which goes by default through, in order, CUDA, HIP, DPC++ to check if any GPU are available, otherwise falls back to either Omp or Reference, whichever was enabled.
  • Use get_concrete_executor() in PolymorphicObject to make all Ginkgo objects transparently work with these new kind of executors.
  • Add tests and an example based on simple-solver to show users how to make use of this new executor type and allow to play with the environment variables.

Some possible issues:

  • If HIP is also CUDA, then the behavior of auto_different_exec becomes hard to code?
  • Is the FakeExecutorBase useful to the MPI executor, what changes could be needed?
  • Is there more convenience interface functions we could want to add to this executor?
  • For now, only executors created with the GenericExecutor are tracked for occupancy. Is there a way to improve the hwloc interaction and information logging to track generic GPU availability, so that the GenericExecutor or its underlying facilities could be reused to track available devices.
  • Is there any other behavior control we want to implement?

TODO:

  • [ ] Double check all tests.

tcojean avatar Feb 09 '21 09:02 tcojean

format!

tcojean avatar Feb 09 '21 09:02 tcojean

format!

tcojean avatar Feb 09 '21 09:02 tcojean

Your approach looks reasonable. The other alternative is to just use the existing Executor class and a 'factory function', say create_executor. In this alternative, a function shared_ptr<const Executor> create_executor(OptionsType options); would read the environment variables and/or other in-program input and generate the correct concrete executor, and return it as pointer-to-Executor. Then I guess we don't need a FakeExecutor, and the get_concrete_executor in PolymorphicObject can still be implemented using, say, std::dynamic_pointer_cast.

What are the advantages of a new GenericExecutor class over this?

Slaedr avatar Feb 22 '21 15:02 Slaedr