TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

feat: Update logits bitmask kernel to v3

Open syuoni opened this issue 9 months ago • 18 comments

The XGrammar team provides important insights on the kernel workload. In most cases, the bitmask tensor is almost-full (bit values are 1) and almost-empty (bit values are 0).

Compared the kernel version on main (v2), the PR introduces the kernel developed in https://github.com/mlc-ai/xgrammar/pull/186 (v3):

  • The kernel v3 shows ~1.3x and ~2.0x speedup on large batch sizes for the almost-full and almost-empty scenarios, respectively.
  • The kernel v3 slightly sacrifices the performance on half-full scenario, compared to v2.

See https://github.com/mlc-ai/xgrammar/tree/main/examples/benchmark#benchmark-apply-token-bitmask-inplace-kernels for more perf numbers. Please see https://github.com/mlc-ai/xgrammar/pull/186 for more background.

syuoni avatar Mar 24 '25 06:03 syuoni

/bot run

syuoni avatar Mar 24 '25 06:03 syuoni

PR_Github #253 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 06:03 niukuo

/bot run

syuoni avatar Mar 24 '25 11:03 syuoni

PR_Github #292 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 11:03 niukuo

PR_Github #253 [ run ] completed with state ABORTED /LLM/main/L0_MergeRequest_PR pipeline #248 completed with status: 'FAILURE'

niukuo avatar Mar 24 '25 11:03 niukuo

PR_Github #292 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #281 completed with status: 'FAILURE'

niukuo avatar Mar 24 '25 13:03 niukuo

/bot run

syuoni avatar Mar 24 '25 14:03 syuoni

PR_Github #310 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 14:03 niukuo

PR_Github #310 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #294 completed with status: 'FAILURE'

niukuo avatar Mar 24 '25 21:03 niukuo

/bot run

syuoni avatar Mar 25 '25 01:03 syuoni

PR_Github #347 [ run ] triggered by Bot

niukuo avatar Mar 25 '25 01:03 niukuo

PR_Github #347 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #319 completed with status: 'FAILURE'

niukuo avatar Mar 25 '25 03:03 niukuo

/bot run

syuoni avatar Mar 25 '25 12:03 syuoni

PR_Github #433 [ run ] triggered by Bot

niukuo avatar Mar 25 '25 12:03 niukuo

PR_Github #433 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #371 completed with status: 'FAILURE'

niukuo avatar Mar 25 '25 13:03 niukuo

/bot run

syuoni avatar Mar 25 '25 14:03 syuoni

PR_Github #442 [ run ] triggered by Bot

niukuo avatar Mar 25 '25 14:03 niukuo

PR_Github #442 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #378 completed with status: 'FAILURE'

niukuo avatar Mar 25 '25 16:03 niukuo

/bot run

syuoni avatar Mar 26 '25 02:03 syuoni

PR_Github #491 [ run ] triggered by Bot

niukuo avatar Mar 26 '25 02:03 niukuo

PR_Github #491 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #423 completed with status: 'SUCCESS'

niukuo avatar Mar 26 '25 06:03 niukuo

/bot reuse-pipeline

byshiue avatar Mar 26 '25 06:03 byshiue

PR_Github #527 [ reuse-pipeline ] triggered by Bot

niukuo avatar Mar 26 '25 06:03 niukuo

PR_Github #527 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #491 for commit 60fd55d

niukuo avatar Mar 26 '25 06:03 niukuo

/bot reuse-pipeline

byshiue avatar Mar 26 '25 07:03 byshiue

PR_Github #535 [ reuse-pipeline ] triggered by Bot

niukuo avatar Mar 26 '25 07:03 niukuo

PR_Github #535 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #491 for commit ff297de

niukuo avatar Mar 26 '25 07:03 niukuo