Bing Xu
Results
72
comments of
Bing Xu
I think it is possible to add a specialized cutlass/ck kernel to accept uint8 as input and do cast in prologue. An unfused way is to add a cast function.
Current v0.1.1 code breaks ROCM. We are waiting AMD engineers to finish merging then tag it. For now you can checkout v0.1 release tag to use all ROCM features. On...