Shaojie WANG

Results 4 issues of Shaojie WANG

We somehow using cublasLt to compute linear operation in modern language model like transformer. We know that in rocm there is a similar lib called hipblasLt. But hipify does not...

question
BLAS

req: - [x] input/output: nchw->nchw-vecc nchw-vecc->nchw - [ ] weight: nchw->chwn-vecc - [ ] padding transpose: for cases c%vecc!=0, padding 0 at vecc's tail

As we continue optimizing the performance and stability for igemmGen and this tool can generate more efficient kernels for igemm or direct conv, we may think about how to merge...