sglang
sglang copied to clipboard
Update decode kernel benchmark with new Triton backend interface
Motivation
The Triton kernel for decode attention was updated with a new backend interface in #3292, breaking the benchmark code.
Modifications
Corrected the import for should_use_tensor_core
and replaced req_to_token
, b_req_idx
, b_seq_len
with kv_indptr
and kv_indices
.
Checklist
- [x] Format your code according to the Code Formatting with Pre-Commit.
- [ ] Add unit tests as outlined in the Running Unit Tests.
- [ ] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
- [ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
- [ ] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
- [ ] Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.