ktransformers icon indicating copy to clipboard operation
ktransformers copied to clipboard

fix rocm build error

Open jiafei96 opened this issue 5 months ago • 3 comments

fix for https://github.com/kvcache-ai/ktransformers/issues/1178

jiafei96 avatar Jul 22 '25 06:07 jiafei96

Looks Good, and one suggestion for runtime exception: undefined symbol: _Z16gptq_marlin_gemmRN2at6TensorES1_S1_S1_S1_S1_lS0_lllib

--- a/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu
+++ b/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu
@@ -74,8 +74,8 @@ namespace gptq_marlin {
 torch::Tensor gptq_marlin_gemm(torch::Tensor& a, torch::Tensor& b_q_weight,
     torch::Tensor& b_scales, torch::Tensor& g_idx,
     torch::Tensor& perm, torch::Tensor& workspace,
-    int64_t num_bits, int64_t size_m, int64_t size_n,
-    int64_t size_k, bool is_k_full) {
+    int64_t num_bits, torch::Tensor size_m_tensor, int64_t size_m, int64_t size_n,
+    int64_t size_k, int sms, bool is_k_full) {
     TORCH_CHECK_NOT_IMPLEMENTED(false,
         "marlin_gemm(..) requires CUDA_ARCH >= 8.0");
     return torch::empty({ 1, 1 });

kevin-t-tang avatar Aug 14 '25 06:08 kevin-t-tang

Looks Good, and one suggestion for runtime exception: undefined symbol: _Z16gptq_marlin_gemmRN2at6TensorES1_S1_S1_S1_S1_lS0_lllib

--- a/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu
+++ b/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu
@@ -74,8 +74,8 @@ namespace gptq_marlin {
 torch::Tensor gptq_marlin_gemm(torch::Tensor& a, torch::Tensor& b_q_weight,
     torch::Tensor& b_scales, torch::Tensor& g_idx,
     torch::Tensor& perm, torch::Tensor& workspace,
-    int64_t num_bits, int64_t size_m, int64_t size_n,
-    int64_t size_k, bool is_k_full) {
+    int64_t num_bits, torch::Tensor size_m_tensor, int64_t size_m, int64_t size_n,
+    int64_t size_k, int sms, bool is_k_full) {
     TORCH_CHECK_NOT_IMPLEMENTED(false,
         "marlin_gemm(..) requires CUDA_ARCH >= 8.0");
     return torch::empty({ 1, 1 });

I sincerely apologize. This patch only addresses compilation issues. Since vLLMMarlin wasn't supported, I had commented it out in my local environment. Alternatively, we could move the import vLLMMarlin statement to the actual usage location. I'll update the patch shortly.

jiafei96 avatar Aug 26 '25 06:08 jiafei96

Looks Good, and one suggestion for runtime exception: undefined symbol: _Z16gptq_marlin_gemmRN2at6TensorES1_S1_S1_S1_S1_lS0_lllib

--- a/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu
+++ b/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu
@@ -74,8 +74,8 @@ namespace gptq_marlin {
 torch::Tensor gptq_marlin_gemm(torch::Tensor& a, torch::Tensor& b_q_weight,
     torch::Tensor& b_scales, torch::Tensor& g_idx,
     torch::Tensor& perm, torch::Tensor& workspace,
-    int64_t num_bits, int64_t size_m, int64_t size_n,
-    int64_t size_k, bool is_k_full) {
+    int64_t num_bits, torch::Tensor size_m_tensor, int64_t size_m, int64_t size_n,
+    int64_t size_k, int sms, bool is_k_full) {
     TORCH_CHECK_NOT_IMPLEMENTED(false,
         "marlin_gemm(..) requires CUDA_ARCH >= 8.0");
     return torch::empty({ 1, 1 });

@kevin-t-tang Just updated – could you please review? Thanks!

jiafei96 avatar Aug 28 '25 08:08 jiafei96