RAJAPerf
RAJAPerf copied to clipboard
Updated FIR to add __shared__ optimization of Base_HIP variant.
Modified RAJAPerf/src/apps/{FIR-Hip.cpp,FIR.hpp} to use shared/LDS memory in Base_HIP variant to reduce pressure on vL1D/L2 cache, which resulted in a > 1.5x speedup under ROCm-6.4.0 for --size 100000000 /* 1E8 */ .