sglang Update decode kernel benchmark with new Triton backend interface

Update decode kernel benchmark with new Triton backend interface

Open amosyou opened this issue 2 weeks ago • 2 comments

The Triton kernel for decode attention was updated with a new backend interface in #3292, breaking the benchmark code.

Corrected the import for should_use_tensor_core and replaced req_to_token, b_req_idx, b_seq_len with kv_indptr and kv_indices.

[x] Format your code according to the Code Formatting with Pre-Commit.
[ ] Add unit tests as outlined in the Running Unit Tests.
[ ] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
[ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
[ ] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
[ ] Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

Feb 17 '25 02:02 amosyou