tilelang
tilelang copied to clipboard
would there be an official test/benchmark for a fused kernel of dsa?
I found independent kernels for indexer, topk selector and sparse mla (https://github.com/tile-ai/tilelang/blob/main/examples/deepseek_v32/test_tilelang_example_deepseek_v32.py), for which i am very grateful, but i did not see a fused kernel anywhere
is it because the implementation will highly depend on the setup?
would tilelang team consider providing such a kernel for benchmark and testing?