Alex Brown
Alex Brown
Reduces data copying and comparisons when reading arrays, maps, and enums.
Script to remove kernels from a logic file that are not used by any of the tuned sizes
Alternative implementation of the 2-tile algorithm that does DP tiles first and SK tiles after. This method should have a small boost in performance.
This update fixes the case when alpha=0 by ensuring that A/B matrices are not read and main loop does not run. Also added a new small test case with stream-k...
GSU=0 should disable all GSU code. This change updates some sections of code that were still generating GSU-related code when GSU was disabled.
Allow wavegroup to be less than 4 in stream-k kernels. This change updates the partials and fixup code to tkae number of waves into account. Added new test cases to...