masahi

Results 70 comments of masahi

This looks similar to https://github.com/apache/tvm/pull/12750, maybe we don't need this? cc @vinx13

I'm assuming this has been superceded by https://github.com/apache/tvm/pull/14036

Please use the forum for general usage questions. https://discuss.tvm.apache.org/

@vpirogov @dzarukin What about GPU? MHA optimization like [flash attention](https://github.com/HazyResearch/flash-attention) yield significant speed up for NV GPUs for large seq length models (stable diffusion UNet etc). There is [another CUDA...

Yeah, I've seen incredible performance out of oneDNN conv2d / gemm kernels on Arc GPU. So I'm looking forward to the availability of a fused MHA kernel for better performance....

> Additionally, for really small values of M, you are likely do be b/w bound anyway, for which you can likely get roofline perf from recompiling CUTLASS 2.x Ampere kernels...

> `BLOCK_M >= 128` requirement likely comes from the fact only `cooperative` kernel support exists today in CUTLASS 3.5. If support for other kernel schedules is added (`tma_warpspecialized` or `tma_warpspecialized_pingpong`)...

The error is coming from LLVM. I heard that LLVM has recently started to depend on zstd. So you need to install libzstd-dev.

For now only the model import is supported for v2.1. The compilation pipeline, including some js code, might need some change.

Some thought on this problem: To guarantee the soundness of graph-level layout transformation, we need to be able to infer the new layout-sensitive attributes for all ops, 100% reliably. That...