ginkgo
ginkgo copied to clipboard
(CUDA) kernels inside namespace or suffix `_kernel`
In most cases, we append the suffix _kernel
to a function to identify it as a function which is not executed on the host machine (currently only CUDA code). However, there are instances where we use a namespace kernel
around these functions (as far as I can tell, this is happens in cuda/matrix/dense_kernels.cu
, cuda/components/zero_array.cu
and cuda/preconditioner/jacobi_kernels.cu
).
What do you think about that, @ginkgo-project/developers ?
We should add it to the Contributing guidelines when we have decided on the matter.
Personally, I prefer the suffix _kernel
, so it is clear at every call and by looking at the function directly (namespace would require to look at the neighborhood of the function as well) that this is a special function and a call will launch it on a different device.
But I don't mind the alternative either, I would just like to have it consistent.
Just for completeness, jacobi_kernels.cu
and zero_array.cu
also use the namespace kernel
.
@tcojean True, I guess my search was not thorough enough. Updated the issue description.
The namespace kernel
is also used in all executor files omp/
, reference/
as well as cuda/
. And only in the cuda/
folder, some of the functions have the suffix _kernel
. There we specifically dont have sub-functions to add the _kernel
suffix (atleast not yet). So maybe the namespace is a good idea ?
@pratikvn Really, where? Every part is inside a namespace kernels
, after which a namespace omp
, reference
or cuda
comes. But I did not find any function encapsulated inside a namespace kernel
in the others.
Every part is inside a namespace kernels, after which a namespace omp, reference or cuda comes
That is what I meant. And I think that is enough ? In cuda/matrix/dense_kernels.cu
and others we have functions are separately in the namespace and that is confusing.
The question is how we should encapsulate actual CUDA kernels (that have to be called with <<<grid, dim, ...>>>
) in order to associate them with the global function.
For example if you have a function initialize
from an algorithm, we name the associated CUDA kernel (which is called from initialize
) either initialize_kernel
or put it inside an additional namespace kernel::initialize
.
We might actually need the namespace, so we can also put the __device__
functions in there.