masahi comments

Results 70 comments of


                                            masahi

[DietCode] Local Padding

This looks similar to https://github.com/apache/tvm/pull/12750, maybe we don't need this? cc @vinx13

[Auto Scheduler] Upgrade autoscheduler xgboost callback

I'm assuming this has been superceded by https://github.com/apache/tvm/pull/14036

[Bug] Can the virtual machine executor support inference on dynamic shape of OCR model?

Please use the forum for general usage questions. https://discuss.tvm.apache.org/

Transformer-like model inference acceleration

@vpirogov @dzarukin What about GPU? MHA optimization like [flash attention](https://github.com/HazyResearch/flash-attention) yield significant speed up for NV GPUs for large seq length models (stable diffusion UNet etc). There is [another CUDA...

Transformer-like model inference acceleration

Yeah, I've seen incredible performance out of oneDNN conv2d / gemm kernels on Arc GPU. So I'm looking forward to the availability of a fused MHA kernel for better performance....

[FEA] FP8 grouped gemm kernel without TMA

> Additionally, for really small values of M, you are likely do be b/w bound anyway, for which you can likely get roofline perf from recompiling CUTLASS 2.x Ampere kernels...

[FEA] FP8 grouped gemm kernel without TMA

> `BLOCK_M >= 128` requirement likely comes from the fact only `cooperative` kernel support exists today in CUTLASS 3.5. If support for other kernel schedules is added (`tma_warpspecialized` or `tma_warpspecialized_pingpong`)...

Upgrade LLVM to 15 when TVM build

The error is coming from LLVM. I heard that LLVM has recently started to depend on zstd. So you need to install libzstd-dev.

How to use SD 2.1?

For now only the model import is supported for v2.1. The compilation pipeline, including some js code, might need some change.

[Discuss] Extract Op Info from Primfunc

Some thought on this problem: To guarantee the soundness of graph-level layout transformation, we need to be able to infer the new layout-sensitive attributes for all ops, 100% reliably. That...