Corbin Robeck issues

Results 11 issues of


                                            Corbin Robeck

hip mfma tests

This PR adds basic functionality test of leveraging the matrix cores on AMD gfx908 and gfx90a hardware for dense matrix products.

Minimum required gcc 9.0 when compiling with nvcc?

Compiling with gcc 8.1.0 + nvcc 11.2.67 gives the following build error: /RAJAPerf/src/basic/MAT_MAT_SHARED-Cuda.cpp: In instantiation of 'void rajaperf::basic::MAT_MAT_SHARED::runCudaVariantImpl(rajaperf::VariantID) [with long unsigned int block_size = 256]': /tmp/tmpxft_002d5158_00000000-6_MAT_MAT_SHARED-Cuda.cudafe1.stub.c:12:193: required from here RAJAPerf/src/basic/MAT_MAT_SHARED-Cuda.cpp:234:4:...

build

cuda

Add matrix cores test to perf suite

AMD gfx90a architecture has built in hardware support (matrix cores) for dense matrix operations. The two that are of interest to RAJA are: __builtin_amdgcn_mfma_f64_4x4x4f64 __builtin_amdgcn_mfma_f64_16x16x4f64 Done criteria: - Add a...

enhancement

Add Tests to Perf Suite

1. Cover the full variety of different reducers in RAJA 2. Add a test that does one min reduction 3. Add a test that does one min-loc reduction 4. Add...

enhancement

Add MARBL Matrix Free Solve Test

We are interested in kernels for solving the advection equation: M[du/dt] = K u with DG-FEM. We can break this up into two parts. 1. y = inv(M) x Since...

enhancement

Add a test that exercises long double in a way that would break storage/alignment if not handled correctly by the compiler

Currently sizeof(long double) returns different values in the AMDGPU backend depending if called from the CPU or GPU. This could cause storage/alignment issues if not handled correctly. Add a test...

testing

reviewed

hip support

Corbin Robeck

hip mfma tests

Minimum required gcc 9.0 when compiling with nvcc?

Add matrix cores test to perf suite

Add Tests to Perf Suite

Add MARBL Matrix Free Solve Test

Add a test that exercises long double in a way that would break storage/alignment if not handled correctly by the compiler

Add a kernel that exercises long double in a struct captured in a lambda to make sure storage/alignment of unsupported types are not corrupted

[AMDGPU] Allow kernel instrumentation passes to be added to pipeline

[PROTON][AMD] Add HipGraph Support For AMD GPUs

[Proton][Dialect] Add Proton Device Memory Buffer Init and Allocate Pass