masahi
masahi
TF also says the output is non deterministic if there are duplicated indices https://www.tensorflow.org/api_docs/python/tf/compat/v1/scatter_update JAX too, https://jax.readthedocs.io/en/latest/_autosummary/jax.lax.scatter.html So the industry clearly favors more performance than guaranteeing determinism. I highly doubt...
What do you mean by "TVM tile size shape"?
I'm aware that our deform covn2d support lacks `mask` support. If that is what this PR is about, this is great! Please clean up your change (I see unrelated removal...
> LGTM! When passing paged kv cache, is there any assumption there? e.g., layout Yes, I added shape and dtype requirements as comments.
@vinx13 Does this CI failure seem like a compilation timeout https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/PR-16474/30/pipeline? I remember you hit something like this before.
oof. It's not surprising, since I added a new variant of kernel (flash decoding) with yet another many explicit template instantiations https://github.com/tlc-pack/libflash_attn/pull/9
> To make it work, it is required to apply patch to mlc-relax We need to merge that one first before this. Also can the TVM-side change be sent to...
I think for `scaled_dot_product_attention` you need to transpose the second and the third axes of qkv. It requires a different layout. ``` query = torch.rand(1, 8, 16, 32, dtype=torch.float16, device="cuda")...
Hi @yzh119, I'm wondering if the flashinfer kernel can be implemented over the vllm's paged KV cache. Does one of the item, "Support general page table layout", address such issue?...
> yes we will have a unified interface that's compatible with both vllm and the current page table design. Great! cc @vinx13 @sunggg > Batch prefill with paged kv is...