Corbin Robeck

Results 11 issues of Corbin Robeck

This PR adds basic functionality test of leveraging the matrix cores on AMD gfx908 and gfx90a hardware for dense matrix products.

Compiling with gcc 8.1.0 + nvcc 11.2.67 gives the following build error: /RAJAPerf/src/basic/MAT_MAT_SHARED-Cuda.cpp: In instantiation of 'void rajaperf::basic::MAT_MAT_SHARED::runCudaVariantImpl(rajaperf::VariantID) [with long unsigned int block_size = 256]': /tmp/tmpxft_002d5158_00000000-6_MAT_MAT_SHARED-Cuda.cudafe1.stub.c:12:193: required from here RAJAPerf/src/basic/MAT_MAT_SHARED-Cuda.cpp:234:4:...

build
cuda

AMD gfx90a architecture has built in hardware support (matrix cores) for dense matrix operations. The two that are of interest to RAJA are: __builtin_amdgcn_mfma_f64_4x4x4f64 __builtin_amdgcn_mfma_f64_16x16x4f64 Done criteria: - Add a...

enhancement

1. Cover the full variety of different reducers in RAJA 2. Add a test that does one min reduction 3. Add a test that does one min-loc reduction 4. Add...

enhancement

We are interested in kernels for solving the advection equation: M[du/dt] = K u with DG-FEM. We can break this up into two parts. 1. y = inv(M) x Since...

enhancement

Currently sizeof(long double) returns different values in the AMDGPU backend depending if called from the CPU or GPU. This could cause storage/alignment issues if not handled correctly. Add a test...

testing
reviewed
hip support

This is technically an architecture agnostic patch but as of current AMDGPU is the only backend that would make use of it.

Follow on to: https://github.com/triton-lang/triton/pull/4137/

Add the init and allocation of the Proton dialect device buffer that can be used in place of the shared memory buffer. The device buffer is just a module local,...