occa icon indicating copy to clipboard operation
occa copied to clipboard

Add ability in OKL to call an OCCA kernel from within another OCCA kernel routine

Open pdhahn opened this issue 6 years ago • 1 comments

It would be very useful in OKL for the programmer to have the ability to invoke the (or a) nested kernel defined in a different OCCA kernel routine. This would facilitate sequences (e.g., in a "dataflow" sense) of nested kernel invocations within a containing OCCA kernel routine. If device data survives across a sequence of calls to nested kernels, it could also help cut down on the overhead involved with invoking a sequence of separate OCCA kernel routines from host code, and thereby improve performance in those cases.

Right now, the only way to "re-use" code within OKL code is by cut-and-paste, using OCCA functions, or by using macros. Cut-and-paste is usually a bad idea per the obvious arguments. And OCCA functions cannot contain parallel code using outer/inner to define "nested" kernels. So that just leaves macros, which are not ideal especially if the code is lengthy / complicated / needs in-line comments.

One suggestion is to imitate the old Fortran ENTRY concept, where now in the case of OKL, nested kernels defined and contained in one OCCA kernel routine can be called within another (or even the same) OCCA kernel routine. Ideally, this could involve a new OKL keyword to specify the "ENTRY" point explicitly, although the OCCA kernel routine name would implicitly be an ENTRY (i.e., at the top, before the first nested kernel).

To keep things feasible as well as simple, ENTRY's would only be possible in the context at which the nested kernels are defined, not within the nested kernels themselves. Likewise, the call to an ENTRY must always be at the context in which nested kernels are invoked, not within the nested kernels themselves.

pdhahn avatar Jun 20 '18 20:06 pdhahn

Another concept involved with all this besides "dataflow" would be "nested kernel chaining" -- e.g., for purposes of preconditioning, or doing reductions. Again, all "under the hood" so to speak, without the overhead of having to transit back and forth to calling code at the host level.

pdhahn avatar Jun 20 '18 21:06 pdhahn