Shaojie WANG
Shaojie WANG
We somehow using cublasLt to compute linear operation in modern language model like transformer. We know that in rocm there is a similar lib called hipblasLt. But hipify does not...
req: - [x] input/output: nchw->nchw-vecc nchw-vecc->nchw - [ ] weight: nchw->chwn-vecc - [ ] padding transpose: for cases c%vecc!=0, padding 0 at vecc's tail
As we continue optimizing the performance and stability for igemmGen and this tool can generate more efficient kernels for igemm or direct conv, we may think about how to merge...