Arturo Vargas
Arturo Vargas
Hi @delcmo, if I understand correctly -- and we can iterate on this. Regarding the first part I think you can use the same global policies for the outer for...
Hi @delcmo , yes the proposed code would work. In regards to the error..... I am actually not sure. Could you try running it through cuda-gdb and share a stack...
To double check, are you choosing values for NThreads or NTeams_{x,y.z}? The max number of threads in CUDA per block (RAJA Team) is 1024, NThreads = 10 should work..... I'll...
> @artv3 I've rebuilt and installed the latest version of the SYCL HIP compiler on `corona` in `/usr/workspace/raja-dev/clang_sycl_hip_gcc10.2.1_rocm5.1.0/install`. You can try the compiler out by following steps `A` and `C`...
> > I hit the following build errors: > > ``` > > clang-14: error: unknown argument: '-fsycl-unnamed-lambda' > > clang-14: error: unknown argument: '-fsycl-targets=amdgcn-amd-amdhsa' > > ``` > >...
Hi all, I feel this PR might be in pretty good shape to start reviewing. I do want to propose an additional API change, should we change Teams -> Blocks...
To check out the dynamic shared memory I suggest looking at the following example: dynamic_mat_transpose.cpp It has support for seq/openmp/cuda/hip/sycl.
> > Second, I have a prototype for perfectly nested loops here, its not very flexible though, my thought was to remove what is in there and follow up with...
@MrBurmark @trws @homerdin @rhornung67 @rchen20 , this PR is ready for review.
> There are a couple other experimental documentation sentences which should be removed or edited, and there are a couple `loop` and `tile` policies still in the `::expt` namespace. Otherwise,...