cutlass issues

[QST] Request useful script shown in GTC

8

I see in [discution](https://github.com/NVIDIA/cutlass/discussions/427) about GTC talk (S41606), you have developed a usefully code-gen script, However I did not find it in repo. Would you please tell where I can...

lw921014

question

inactive-30d

inactive-90d

[BUG] Launch conv kernel with offline ptx failed as invalid argument

10

I am trying to runtime load offline compiled ptx using the same CUDA source file and launch kernel using cuLaunchKernel, but examples/16_ampere_tensorop_conv2dfprop failed with driver error code 1. ``` >...

shenzhenghai

bug

[QST] Python Conv2d example?

3

Hello, I would like to implement a custom Pytorch kernel using the CUTLASS 2D convolution. I saw that you released Python scripts in release 2.9 to launch a GEMM kernel...

akirchmeyer

feature request

inactive-30d

[QST] 2 GEMM fused result error

31

Also for this [case](https://github.com/NVIDIA/cutlass/blob/master/examples/13_two_tensor_op_fusion/b2b_gemm_f16t_f16n_f16t_tensor_op_f16_sm75.h#L44). I try to use some other parameter to verity the result, such as ` cutlass::gemm::GemmCoord gemm_f16_sm75_problem_size_0(10, 64, 576); cutlass::gemm::GemmCoord gemm_f16_sm75_problem_size_1(10, 128, 64); ` it run ok,...

lw921014

question

inactive-30d

inactive-90d

[Feature] bias_residual version of gemm_layernorm_gemm_fusion

Test code for this version can be found in `examples/37_gemm_layernorm_gemm_fusion/gemm_layernorm_bias_residual.cu`. Things need to be modified are marked as `TODO`

lygztq

[DRAFT, do not merge] gemm_universal_with_broadcast, +2 sources.

This is the original gemm_universal_with_broadcast PR written at April. The added unittest test/unit/gemm/device/gemm_broadcast_test.cu passed at that time. But now it cannot pass any more.

ipiszy

[BUG] CUDA Error CUresult.CUDA_ERROR_ILLEGAL_ADDRESS when using cutlass_tensorop_s1688tf32gemm op

2

**Describe the bug** I am trying to do a gemm between two fp32 arrays using the python api to produce a fp32 output. I would like to leverage tensor cores...

rkindi

bug

[QST] The performance of Hopper group gemm is not meeting expectation in some cases

1

I ran the example 57_hopper_grouped_gemm with different options and found that the performance degrades when beta != 0. For example, if you run the following command `./examples/57_hopper_grouped_gemm/57_hopper_grouped_gemm --m=5120 --n=1280 --k=256...

AndySong20

question

? - Needs Triage

[BUG] w4a8 mixed-input gemm for fine-grained quantization

11

Refer to #1316, I have tried 55th example: 55_hopper_mixed_dtype_gemm. It works fine for w4a8 groupsize=128, which incudes changes from baseline like: `using MmaType = int8_t;` `using ElementC = int32_t; `...

jianfei-wangg

bug

feature request

[QST] Gather/Scatter in cute/cutlass 3

14

Hi everybody, I'm currently trying to writing a trainer for a very small an oddly shaped network which requires a lot of gather/scatter. E.g. one layer looks like this: C...

akamiru

question

cutlass
cutlass copied to clipboard

Metadata

[QST] Request useful script shown in GTC

[BUG] Launch conv kernel with offline ptx failed as invalid argument

[QST] Python Conv2d example?

[QST] 2 GEMM fused result error

[Feature] bias_residual version of gemm_layernorm_gemm_fusion

[DRAFT, do not merge] gemm_universal_with_broadcast, +2 sources.

[BUG] CUDA Error CUresult.CUDA_ERROR_ILLEGAL_ADDRESS when using cutlass_tensorop_s1688tf32gemm op

[QST] The performance of Hopper group gemm is not meeting expectation in some cases

[BUG] w4a8 mixed-input gemm for fine-grained quantization

[QST] Gather/Scatter in cute/cutlass 3

← Metadata

Owner

Metadata

cutlass cutlass copied to clipboard

Metadata

← Metadata

Owner

Metadata

cutlass
cutlass copied to clipboard