ginkgo
ginkgo copied to clipboard
Generic executor
Adds a Generic Executor to dynamically select any concrete Executor
The behavior of the GenericExecutor
can be controlled through three parameters,
which can be set either to the constructor or as environment variable. The controls
are (environment variable version):
- GINKGO_GENERIC_EXEC_TYPE : a specific executor type to target. One of "cuda", "hip", "dpcpp", "omp", "reference", or the default "all".
- GINKGO_GENERIC_EXEC_ID : a specific device ID to target. The default -1 allows to consider any ID.
- GINKGO_GENERIC_EXEC_AUTO : can be set to 0 or 1, controls whether subsequent calls should provide the next executor in the list or the default behavior of providing the same one.
In detail, this:
- Fix ginkgo-overhead example to use
1.0
instead ofNaN
(unrelated) - Add a new executor base,
FakeExecutorBase
to simplify the implementation of these kind of executors. - Add the GenericExecutor with its three control parameters. The
create
function manages the environment variables, whereas the constructor only uses flags. - The only part which is a bit more involved in the GenericExecutor is the the
all
behavior, which goes by default through, in order, CUDA, HIP, DPC++ to check if any GPU are available, otherwise falls back to either Omp or Reference, whichever was enabled. - Use
get_concrete_executor()
inPolymorphicObject
to make all Ginkgo objects transparently work with these new kind of executors. - Add tests and an example based on simple-solver to show users how to make use of this new executor type and allow to play with the environment variables.
Some possible issues:
- If HIP is also CUDA, then the behavior of
auto_different_exec
becomes hard to code? - Is the
FakeExecutorBase
useful to theMPI
executor, what changes could be needed? - Is there more convenience interface functions we could want to add to this executor?
- For now, only executors created with the GenericExecutor are tracked for occupancy.
Is there a way to improve the hwloc interaction and information logging to track generic
GPU availability, so that the
GenericExecutor
or its underlying facilities could be reused to track available devices. - Is there any other behavior control we want to implement?
TODO:
- [ ] Double check all tests.
format!
format!
Your approach looks reasonable. The other alternative is to just use the existing Executor
class and a 'factory function', say create_executor
. In this alternative, a function shared_ptr<const Executor> create_executor(OptionsType options);
would read the environment variables and/or other in-program input and generate the correct concrete executor, and return it as pointer-to-Executor
. Then I guess we don't need a FakeExecutor
, and the get_concrete_executor
in PolymorphicObject
can still be implemented using, say, std::dynamic_pointer_cast
.
What are the advantages of a new GenericExecutor
class over this?