RAJAPerf icon indicating copy to clipboard operation
RAJAPerf copied to clipboard

Updated FIR to add __shared__ optimization of Base_HIP variant.

Open mxxw opened this issue 3 months ago • 0 comments

Modified RAJAPerf/src/apps/{FIR-Hip.cpp,FIR.hpp} to use shared/LDS memory in Base_HIP variant to reduce pressure on vL1D/L2 cache, which resulted in a > 1.5x speedup under ROCm-6.4.0 for --size 100000000 /* 1E8 */ .

mxxw avatar Sep 10 '25 08:09 mxxw