Haicheng Wu comments

Results 323 comments of


                                            Haicheng Wu

cleanup code in tile iterator example

@kerrmudgeon

[BUG] CUTLASS 3.6 profiler doesn't read Instantiation Level that we pass in (Hopper SM90)

@alihassanijr , could you please help with this?

[DOC] Possible typos in fundamental_types.md document

you are correct. we will fix it next time upstream. thank you for catching this.

[FEA][Inductor-EVT] tanh, sigmoid, exp, gelu are not supported in python evt tracer

@jackkosaian , @apuaaChen could you please take a look?

[FEA][torchinductor-EVT] Allow function source code to be passed directly to EVT tracer

@jackkosaian

[QST] Memory-bound nvfp4 grouped gemm

group gemm is supported in the profiler. you could use cutlass profiler to pick the best kernel. cc += @ANIKET-SHIVAM

Limit the number of SMs (sm_count) to user-provided value during profiling.

do you see different kernel get picked when changing sm count?

[QST]Do we need to tune cutlass gemm to use it for all shape?

warp tile size k should be bigger than mma instruction k so that we can run multiple mma in the inner loop to use mma to hide other latencies. cutlass...

[QST] Support fp8 gemm with 128x1 LHS scaling and 1x128 RHS scaling

> dose it support fp8 gemm with 128x1 LHS scaling and 1x128 RHS scaling? yes

[QST] How to pack int4 tensor correctly in PyTorch

maybe ask this to pytorch? cc += @jackkosaian