Jagadish Krishnamoorthy
Results
2
issues of
Jagadish Krishnamoorthy
When launching apply_rotary_pos_half kernel, only threads_per_head of 64 is supported for wavefront size of 64. This change adds support for threads_per_head < 64 such as 4, 8, 16. Remove the...
1. In Context.cpp, for the variable allow_tf32 remove static const variable type to non const variable. This allows us to capture the env variable HIPBLASLT_ALLOW_TF32 changes. 2. Add ROCm arch...