ankutalev
ankutalev
**What is your question?** Hello! I want to implement elementwise epilogue, which depends on output matrix coordinates, i.e. ``` d_ij = F(alpha * sum_k(a_ik * b_kj) + c_ij, i, j)...
I understand that this project is abandoned, but maybe there is a chance to be merged.
[BUG][QST] Hopper Grouped GEMM Fails When Workspace not aligned at 64, but MinWorkspaceAlignment =16
**Describe the bug** See title - I expected GroupedGemm works, when workspace pointer 16-bits aligned, but it fails with `Got bad cuda status: misaligned address at line: 596` for 16...
Hello! This MR provides two things: 1) Zero points for default mode 2) GPT-Q [semantics](https://pytorch.org/blog/accelerating-triton/) Closes #2261