quda icon indicating copy to clipboard operation
quda copied to clipboard

Generic Kernels: performance regression in non-MMA UV, VUV, Yhat for typical staggered sizes

Open weinbe2 opened this issue 4 years ago • 0 comments

A comparison of perf numbers between release/1.1.x and GK shows there's a noteworthy regression in performance for typical staggered problem sizes and Nc values (24^4 local volume, Nc = 24; 64; 96). One solution to this may be re-tuning tile sizes for staggered problems, a known outstanding issue (the cost of staggered setup is heavily dominated by near-null generation for now anyway). We should revisit this after GK is merged as part of porting the staggered KD clean-up.

Of note, none of the MMA versions of the kernels are effected by release/1.1.x -> GK.

weinbe2 avatar Sep 28 '21 20:09 weinbe2