ktransformers fix rocm build error

fix for https://github.com/kvcache-ai/ktransformers/issues/1178

Jul 22 '25 06:07 jiafei96

Looks Good, and one suggestion for runtime exception: undefined symbol: _Z16gptq_marlin_gemmRN2at6TensorES1_S1_S1_S1_S1_lS0_lllib

--- a/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu
+++ b/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu
@@ -74,8 +74,8 @@ namespace gptq_marlin {
 torch::Tensor gptq_marlin_gemm(torch::Tensor& a, torch::Tensor& b_q_weight,
     torch::Tensor& b_scales, torch::Tensor& g_idx,
     torch::Tensor& perm, torch::Tensor& workspace,
-    int64_t num_bits, int64_t size_m, int64_t size_n,
-    int64_t size_k, bool is_k_full) {
+    int64_t num_bits, torch::Tensor size_m_tensor, int64_t size_m, int64_t size_n,
+    int64_t size_k, int sms, bool is_k_full) {
     TORCH_CHECK_NOT_IMPLEMENTED(false,
         "marlin_gemm(..) requires CUDA_ARCH >= 8.0");
     return torch::empty({ 1, 1 });

Aug 14 '25 06:08 kevin-t-tang

Looks Good, and one suggestion for runtime exception: undefined symbol: _Z16gptq_marlin_gemmRN2at6TensorES1_S1_S1_S1_S1_lS0_lllib

--- a/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu
+++ b/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu
@@ -74,8 +74,8 @@ namespace gptq_marlin {
 torch::Tensor gptq_marlin_gemm(torch::Tensor& a, torch::Tensor& b_q_weight,
     torch::Tensor& b_scales, torch::Tensor& g_idx,
     torch::Tensor& perm, torch::Tensor& workspace,
-    int64_t num_bits, int64_t size_m, int64_t size_n,
-    int64_t size_k, bool is_k_full) {
+    int64_t num_bits, torch::Tensor size_m_tensor, int64_t size_m, int64_t size_n,
+    int64_t size_k, int sms, bool is_k_full) {
     TORCH_CHECK_NOT_IMPLEMENTED(false,
         "marlin_gemm(..) requires CUDA_ARCH >= 8.0");
     return torch::empty({ 1, 1 });

I sincerely apologize. This patch only addresses compilation issues. Since vLLMMarlin wasn't supported, I had commented it out in my local environment. Alternatively, we could move the import vLLMMarlin statement to the actual usage location. I'll update the patch shortly.

Aug 26 '25 06:08 jiafei96

Looks Good, and one suggestion for runtime exception: undefined symbol: _Z16gptq_marlin_gemmRN2at6TensorES1_S1_S1_S1_S1_lS0_lllib

--- a/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu
+++ b/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu
@@ -74,8 +74,8 @@ namespace gptq_marlin {
 torch::Tensor gptq_marlin_gemm(torch::Tensor& a, torch::Tensor& b_q_weight,
     torch::Tensor& b_scales, torch::Tensor& g_idx,
     torch::Tensor& perm, torch::Tensor& workspace,
-    int64_t num_bits, int64_t size_m, int64_t size_n,
-    int64_t size_k, bool is_k_full) {
+    int64_t num_bits, torch::Tensor size_m_tensor, int64_t size_m, int64_t size_n,
+    int64_t size_k, int sms, bool is_k_full) {
     TORCH_CHECK_NOT_IMPLEMENTED(false,
         "marlin_gemm(..) requires CUDA_ARCH >= 8.0");
     return torch::empty({ 1, 1 });

@kevin-t-tang Just updated – could you please review? Thanks!

Aug 28 '25 08:08 jiafei96