occa
occa copied to clipboard
Pass variables outside @outer loops to launched kernels as additional arguments
kernel void k(void *qdata, const double *u){
for (CeedInt i=0; i<Q; i++; outer) {
const CeedScalar *J=u+Q*NC;
CeedScalar *qd = (double*) qdata;
qd[...] = J[...];
}
is fine but if we put the variable declaration of J before the for loop, CUDA kernel won't get the right pointer J:
kernel void k(void *qdata, const double *u){
const CeedScalar *J=u+Q*NC;
CeedScalar *qd = (double*) qdata;
for (CeedInt i=0; i<Q; i++; outer) {
qd[...] = J[...];
}
I think the pointer is being incremented before being passed to the kernel, resulting in some weird pointer (since the pointer is actually a handle to the CUDA pointer)
Isn't this just an issue with address spaces?
As for me, I learned to take care not to do what is shown in the example. :-) For pre-1.0 OCCA at least, the address space issue w.r.t. code located inside or outside the for-outer-inner block (actual nested kernel) was a learning curve for me personally.
BTW see also somewhat-related #82. At least, similar in terms of address space issues that are involved, I think.
I understand the issue, and how the parser instantiates the nested kernels later in the code. But what bothers me is that the provided kernel should be in 'his space' right away, not depending on the location of the for-loops in its body. Once aware of this behavior, that's ok!