masahi comments

Results 70 comments of


                                            masahi

[ScatterElements] Clarify the requirement on indices uniqueness

TF also says the output is non deterministic if there are duplicated indices https://www.tensorflow.org/api_docs/python/tf/compat/v1/scatter_update JAX too, https://jax.readthedocs.io/en/latest/_autosummary/jax.lax.scatter.html So the industry clearly favors more performance than guaranteeing determinism. I highly doubt...

[QST] How to avoid too many resources requested

What do you mean by "TVM tile size shape"?

[Relay] Support dcnv2(mask) for pytorch in tvm

I'm aware that our deform covn2d support lacks `mask` support. If that is what this PR is about, this is great! Please clean up your change (I see unrelated removal...

Update flash attention to integrate flash decoding with paged KV cache

> LGTM! When passing paged kv cache, is there any assumption there? e.g., layout Yes, I added shape and dtype requirements as comments.

Update flash attention to integrate flash decoding with paged KV cache

@vinx13 Does this CI failure seem like a compilation timeout https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/PR-16474/30/pipeline? I remember you hit something like this before.

Update flash attention to integrate flash decoding with paged KV cache

oof. It's not surprising, since I added a new variant of kernel (flash decoding) with yet another many explicit template instantiations https://github.com/tlc-pack/libflash_attn/pull/9

PoC implementation of SmoothQuant

> To make it work, it is required to apply patch to mlc-relax We need to merge that one first before this. Also can the TVM-side change be sent to...

Flash Attention 2 Output not Equal to PyTorch scaled_dot_product_attention in MusicGen Inference

I think for `scaled_dot_product_attention` you need to transpose the second and the third axes of qkv. It requires a different layout. ``` query = torch.rand(1, 8, 16, 32, dtype=torch.float16, device="cuda")...

[Roadmap] FlashInfer v0.1.0 release checklist

Hi @yzh119, I'm wondering if the flashinfer kernel can be implemented over the vllm's paged KV cache. Does one of the item, "Support general page table layout", address such issue?...

[Roadmap] FlashInfer v0.1.0 release checklist

> yes we will have a unified interface that's compatible with both vllm and the current page table design. Great! cc @vinx13 @sunggg > Batch prefill with paged kv is...