xiaonans issues

Results 4 issues of


                                            xiaonans

Support int type zero-points in weight-only GEMM

Currently some quantized huggingface models save zero-points in int4 datatype directly, like [Qwen/Qwen2-7B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-7B-Instruct-GPTQ-Int4) and [Qwen/Qwen2-1.5B-Instruct-AWQ · Hugging Face](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct-AWQ). But the weight_only_groupwise_quant_matmul in TensorRT-LLM only support fp16 zero-points as input, thus...

feature request

[QST] Is there any fp16xfp16 GEMM sample using CUTE with a performance comparable to cublas?

**What is your question?** I want to write my own fused fp16xfp16 gemm kernel with CUTE, but I can not find a tutorial/sample code with a performance comparable to cublas....

feature request

help wanted

good first issue

question

[FEA] transpose in epilogue/prologue

Now I'm using cutlass in my project. I found that some cases have constraints to the layout, such as input matrix A and output matrix C should be row major....

feature request

inactive-30d

[BUG] Modifying the block/warptile shapes and the output datatype in the unit test causes the tests to fail.

**Describe the bug** I modified the block/warptile shapes and the output datatype in https://github.com/NVIDIA/cutlass/blob/main/test/unit/gemm/device/gemm_s8t_s8n_s32t_tensor_op_s32_sm80.cu, and found some shapes cause the tests to fail. I modified the ElementOutput to cutlass::half_t and...

bug

? - Needs Triage

inactive-30d

inactive-90d